error running any job with lookup stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
srividya
Participant
Posts: 62
Joined: Thu Aug 25, 2005 2:31 am
Location: Ashburn,VA

error running any job with lookup stage

Post by srividya »

Hi,

i have few jobs, well any job with a lookup or a join stage getting hung in my testing environment

All these jobs have a common flow

We read from Oracle Database, use a lookup stage to lookup data from Oracle DB, and write data to a Dataset

What we did to verify?

we checked with the DBAs , there are no locks happening on the DB
we cleared all the RT logs, &PH&
bounced the datastage server twice without any luck
created copy of the job, replaced it with join stage, same result.
removed join/lookup, implemented the logic in the source oracle stage, it completes within 30 seconds.
Moved the same job to another environment, pointed to the testing database, job completes within 30 seconds again

i am not sure what we have missed to check. I am thinking of deleting and re-creating the project tomorrow or in the next week, but would like to understand if there is anything else i can look at

Also i noticed that , every time we try to run the jobs that get hung, a PID as below is generated.

dsadm 32416 1 0 Dec26 ? 00:00:00 /opt/app/xxxxxxxxx/InformationServer8.7/Server/PXEngine/bin/osh -f RT_SC59/OshScript.osh -monitorport 13400 -pf RT_SC59/jpfile -impexp_charset UTF-8 -string_charset UTF-8 -input_charset UTF-8 -output_charset UTF-8 -collation_sequence OFF

i have not seen this type of PID earlier, the information from all over the forum has confused me more. May be i will take a stab at it again after sometime. How can i cleanup PIDs with this message?

Appreciate your help on this.

Thank you
Sri
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

I don't think this process is causing the issue, this is normal.

Are you sure that the queries are fine? can you monitor the number of buffergets if its increasing? can you see any progress in the monitor?
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
srividya
Participant
Posts: 62
Joined: Thu Aug 25, 2005 2:31 am
Location: Ashburn,VA

Post by srividya »

the queries are fine, remove lookup stage, dump both the main query, and lookup data to peek stages, the job is done in 30 seconds.

as soon as the process kicks off, it gets into a dormant state, it waits forever, till we logout the PIDs from director, attempting to release resources was unsuccessful
soumya5891
Participant
Posts: 152
Joined: Mon Mar 07, 2011 6:16 am

Post by soumya5891 »

Are you using any kind of partition in the source data just before entering to lookup or keep it as auto?
Soumya
pavi
Premium Member
Premium Member
Posts: 34
Joined: Mon Jun 03, 2013 2:34 pm

Post by pavi »

I believe it is a memory issue.What configeration are you using?What is the count of records which are flowing through reference?Are you doing an explicit sort before join stage?How are the cpu stats while you are running the job.
srividya
Participant
Posts: 62
Joined: Thu Aug 25, 2005 2:31 am
Location: Ashburn,VA

Post by srividya »

Data is about 100 rows from reference, we have about 800K records from the source. the sort is carried out in the query, so we don't have any sorts applied in data stage.

disk space on scratch is about 20% and on the server it is 27%.
swap memory utilization is 50% at any time, when the job was in working condition, as i have said this is an existing process running fine till 7 am that day. :roll:

the process gets into a "hung" state even before i can check for CPU stats or memory usage.

it looks like datastage has forgot processing this job, as soon as it generates the main_program information log :shock:
the only thing i can see is a PID similar to the one i initially posted.
srividya
Participant
Posts: 62
Joined: Thu Aug 25, 2005 2:31 am
Location: Ashburn,VA

Post by srividya »

The issue was resolved once we restarted the server. we do not know what the problem is and were forced to restart before we could understand the issue as the testing phase was getting delayed :(
Post Reply