rsh issued, no response received

dsdoubt · Post by **dsdoubt** » Thu Apr 07, 2011 1:59 pm

I have alredy set up SSH for Server1 and Server2 from Engine node. Iam able to do the SSH for the servers manaully. Where as not run the jobs from Datastage. Which gives me Section Leader died error.
Server1 is a ETL resourse node, Server2 is a Database server.
The ID used is part of dstage group.
Other jobs are running in the project with the same id, which does not access that particular database.

I get the follwoing error, when I do the configuration Check.

Code: Select all

##I TOCK 000000 11:56:49(001) <main_program> OS charset: 
ISO-8859-1.
##I TOCK 000000 11:56:49(002) <main_program> Input charset: UTF-8.
##I TFSC 000001 11:56:49(003) <main_program> APT configuration file: /path/default.apt
##I TFPA 000028 11:56:49(004) <main_program> APT Startup script: /path/startup.apt

##W TFPM 000152 11:57:22(000) <main_program> Accept timed out retries = 28
##E TFPM 000153 11:57:22(001) <main_program> The section leader on SERVER2 died
##E TFPM 000356 11:57:22(002) <main_program> 

**** Parallel startup failed ****

This is usually due to a configuration error, such as
not having the Orchestrate install directory properly
mounted on all nodes, rsh permissions not correctly
set (via /etc/hosts.equiv or .rhosts), or running from
a directory that is not mounted on all nodes. Look for
error messages in the preceding output.


##I TFPM 000177 11:57:22(003) <main_program> Step started on node ENGINE_SERVER; it uses 7  nodes.
The program running the step is /path/orchadmin.

##I TFPM 000178 11:57:22(004) <main_program> The ORCHESTRATE startup program in /path/standalone.sh is being used.

##I TFPM 000180 11:57:22(005) <main_program> A startup script (in /path/startup.apt) is being used.

##I TFPM 000183 11:57:22(006) <main_program> The TCP port being used for startup is 10,002; the associated socket number is 5.

##I TFPM 000184 11:57:22(007) <main_program> 
Node status:

##I TFPM 000185 11:59:06(012) <main_program>    SERVER1 - 
##I TFPM 000186 11:59:06(013) <main_program> OK

##I TFPM 000185 11:59:06(014) <main_program>    SERVER2 - 
##I TFPM 000187 11:59:06(015) <main_program> rsh issued, no response received

Kindly help me out.

chanaka · Post by **chanaka** » Thu Apr 07, 2011 6:11 pm

Hi,

Can you please issue the below commands and share your output along with the configuration file contents that you are using.
From server1
a. ssh dsusername@server2
b. ping server2

Also let me know the uid/gid of the dsuser in both the machines.

Cheers!

Chanaka

ppgoml · Post by **ppgoml** » Fri Apr 08, 2011 9:10 pm

have you done this step?

On the primary computer, create the remsh file in the /Server/PXEngine/etc/ directory with the following content.
#!/bin/sh
exec /usr/bin/ssh "$@"

PaulVL · Post by **PaulVL** » Fri Apr 08, 2011 9:58 pm

You should also look to the quality of your SSH keys.

What I mean by that is: Are the SSH keys created based upon the shortname of the server or the fully qualified name?

If you are using the dynamic_grid.sh script it populates the dynamic host name with a fully qualified version.

Go peek at /etc/ssh2/hostnames to see your SSH keys (probably the correct path, not sure on your system).

Has your server2 system ever worked in your setup?

Are you credentially mapped to a given user id? Have you executed your SSH tests using that id or your own?

SSH keys are a two part combo. Did you make sure that on your Conductor server you have both keys for Server1 and Server2, and on Server1 you have the key in the user id path for Conductor node, and the same for Server2?

DSXchange

rsh issued, no response received

rsh issued, no response received

Re: rsh issued, no response received