dsrpcd not starting!

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

dsrpcd not starting!

Post by chulett »

My production server gets "bounced" every Sunday at midnight (don't ask me why :evil: ) and things are not quite right in DataStage land this morning.

I can't connect to my project (81016) as the 'dsrcpd' process is not running. As 'dsadm', I've stopped and restarted the server and I get no errors during either process - it comes down and goes back up with no unusual messages. The shared memory segment is being destroyed/created like normal but the rpc process does not start. I can't find any reason why it won't, a search via netstat for anything on port 31538 turns up nothing... I'm stumped at this point.

Support wants me to reboot the box, but this is a production server that houses many applications and I think that is only a Last Resort for me. I'm trying to get ahold of an SA with root privledges to stop/start things using 'S999ds.rc' to see if that makes any difference, but they don't seem to be in any hurry to call me back.

Is there any other troubshooting I can do? Any logs or anything that might shed some light on why the rpc daemon isn't starting?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Re: dsrpcd not starting!

Post by Teej »

I don't know if you use lsof -i, but do that, ensure that NOTHING related to "dsrpc" is visible.

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Craig

http://www.dsxchange.com/viewtopic.php? ... ht=netstat talks about turning on debug mode to get more information.

Kim.
Mamu Kim
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Thanks guys... I don't have "lsof" but don't see anything related to dsrpcd using 'netstat -a' or 'ps -ef'. Heck, I don't see ds anything... just the base SMS.

I'll check out the debug thread.
-craig

"You can never have too many knives" -- Logan Nine Fingers
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

When the services are stopped do a -

lpcs -mop | grep dae or whatever the command is on HP/UX.

This should be clean, if not then the services will not come up successfully, but may appear to and not give any indication otherwise. Also, just because the message appears that memory segments are being destroyed when you shut down does not mean that all memory segments have been removed.

To remove a memory segment do the following -

lpcrm -m {id from lpcs command}

Regards,

Michael Hester
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well, that was interesting...

Thanks for the pointer to the debug thread, Kim, that shed some light on the problem. We got:

RPCPID=29601 - 11:04:29 - uvrpc_debugflag=9 (Debugging level)
RPCPID=29601 - 11:04:29 - In rpc_init()
RPCPID=29601 - 11:04:29 - get service by name bombed errno=9

when trying to start the dsrpcd process. While there was nothing amiss in the /etc/services file, when my SA saw the debug message he mumbled somelike "Hmmm... maybe... let me check something... NIS... nsswitch.conf... that's odd... try it now." And it came up. :roll:

Not sure exactly what changed, but I get the impression it was going to the network for services instead of the local etc file? Anywho, he did something, it works now, I got the long explanation from Ascential Support and I can finally get some work done now.

Thanks again for the responses.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Craig, count your blessings your server gets bounced on a regularly scheduled basis. This is not a technology or software requirement. It is a human requirement.

Since admins hot-mount filesystems, load/upgrade software, a period reboot insures that all settings and configurations make it into the appropriate startup scripts, etc. Failure to periodic reboot means that the unexpected reboot fails to return the system to a up state. Nobody knows why, because soooo many changes could have occurred since the last reboot.

It's comical, I had this argument with a sysadmin who proudly claimed their dev Sun server uptime was almost 365 days. I said yeah, but what's your availability after your next reboot? My point was proved about a month later when some hardware issues caused a reboot after maintenance. It took them a couple of days to run down all of the mount point changes, missing file systems, and unstarting apps. It was comical. :twisted:
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That's an interesting way to look at it, Ken, and not one that had really occured to me. My experience has been more along your 'comical' lines where shops strove for maximum uptime and 'rebooting' a production server was a horrible thought. Heck, one was running redundant Sequoia systems for alarm monitoring and I seem to remember one having been up and running continuously for a period of years. :shock:

The only systems I've worked with up until now that got bounced on a regular basis were NT servers and that was because... well, you just had to or they ended up grinding to a halt or starting to do 'weird things' - or so I was told. :)

I don't think the reasons you've given here have ever occured to the people running these particular servers. They seem to think it needs to be done in order to clean out stray garbage or some such and the last time I specifically asked they said it was because the "Oracle people asked them to." Errr, yah - ok. :roll:

I like your answer better.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

Well, I definitely do NOT recommend bouncing DataStage during one of the most prime time for regular production run to be done. After all, when people want to have a whole bunch of real data to analyze on Monday morning, it would be perfect to bounce it then. Plus you would have real life human bodies there to observe the bounce, and ensure that emergencies are addressed.

I like the way Ken explained things. In the whole world of servers, for production UNIX servers, it is well frowned upon to bounce it, especially when you have customers relying on it. It is a mindset that brought us the 99.7% uptime promises that we see in ads.

And YES, in the old days, Windows NT was VERY prone to having serious cases of memory leaks. But ever since installing SP6, I rarely bounce my desktop server due to that, or other reasons. But yes, it still do those 'weird' things that just demand a bounce. Windows 2000 rarely needs to be bounced (only when they require it after patch installations, and whenever the power goes out. :))

However, it IS a poor excuse to bounce a server because something is 'weird.' This is some folks' general practice that was actually done this past weekend. "Hey, a job is acting weird. Lets bounce DataStage to see if it behave correctly." "Okay, I changed this and that and that, then bounced DataStage, and it works." (Wouldn't it be that this, that, and that actually fixed the problem instead of bouncing DataStage?) "Well, I wanna be safe..."

:roll:

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
Post Reply