dsjob -report

har · Post by **har** » Thu Dec 27, 2007 10:57 am

Hi,
I created a server job and in befor job routine i passing this command
/usr/local/Ascential/DataStage/DSEngine/bin/dsjob -report projectname #JobName# XML >#OUTBOUND##JobName#.xml
And in the job i'm reading the xml and writing the results to a file.
If you look into following xml,i didnt understand why its creating INSTANCE 1 , 2, 3 ,4.Beacuse of this i'm getting wrong counts.
Is their way to drop these instance when it create xml file or why its creating instance in xml file.
- <Stage Name="Lkp_AbcId" StageStatus="2" StageType="PxLookup" Desc="" StartDateTime="2007-12-27T09:56:35" EndDateTime="2007-12-27T09:56:40" ElapsedTime="00:00:05" ElapsedSecs="5">
- <InputLinks>
<Link Name="FromError" LinkType="1" Desc="" Stage="Abc_Error" />
<Link Name="LkpLink_Name" LinkType="1" Desc="" Stage="Abc_Stage_Name" />
</InputLinks>
- <OutputLinks>
<Link Name="Link_Name" LinkType="3" Desc="" Stage="LinkName" />
</OutputLinks>
- <InstanceSet>
- <Instance Id="0" CPU="0.0767" PID="337136">
<Link Name="InputLink_Name" RowCount="816" />
<Link Name="LkpLink_Name" RowCount="816" />
<Link Name="Link_Name" RowCount="16" />
</Instance>
- <Instance Id="2" CPU="0.11" PID="288130">
<Link Name="InputLink_Name" RowCount="815" />
<Link Name="LkpLink_Name" RowCount="815" />
<Link Name="Link_Name" RowCount="16" />
</Instance>
- <Instance Id="3" CPU="0.11" PID="299356">
<Link Name="InputLink_Name" RowCount="815" />
<Link Name="LkpLink_Name" RowCount="815" />
<Link Name="Link_Name" RowCount="14" />
</Instance>
- <Instance Id="1" CPU="0.11" PID="484080">
<Link Name="InputLink_Name" RowCount="816" />
<Link Name="LkpLink_Name" RowCount="816" />
<Link Name="Link_Name" RowCount="16" />
</Instance>
</InstanceSet>
</Stage>

Thanks,

ray.wurlod · Post by **ray.wurlod** » Thu Dec 27, 2007 2:29 pm

Looks like you're getting the individual partition (instance) row counts via DSJ.INSTROWCOUNT - which returns a delimited list of row counts. If you just want the total rows across all partitions, prefer DSJ.LINKROWCOUNT as the fourth argument to DSGetLinkInfo().

har · Post by **har** » Thu Dec 27, 2007 3:42 pm

Ray,
Here i'm using dsjob - report command..
How can i modify this commad to get DSJ.LINKROWCOUNT as the fourth argument to DSGetLinkInfo(). link count.
Thanks,

ray.wurlod · Post by **ray.wurlod** » Thu Dec 27, 2007 6:58 pm

Only by rewriting dsjob. Source code is in the Developer's manuals.

kduke · Post by **kduke** » Thu Dec 27, 2007 9:11 pm

Why not aggregate it?

har · Post by **har** » Fri Dec 28, 2007 9:19 am

If i agreagte,then my counts wont match at all in my case...
I know lot of folks might be using ETLSTATS functionality which is given by kim...
If this the case,i'm not sure how ETLSTATS functionality working properly..
Can any one explain...so that i'll modify my ETLSTATS jobs..
Thanks,

kduke · Post by **kduke** » Sun Dec 30, 2007 1:44 pm

The partition level can be aggregated up to the link level. In effect the partitions are ignored. There should be a job created be Vincent which does this for PX jobs. This job should be included with EtlStats. If not then let me know. Server jobs only have one partition so this was never an issue. I need to post a new version of EtlStats for PX jobs. If people want it then let me know. I have a version which creates a surrogate key on ETL_JOB_HIST which can be feed into a job. Our wrapper script takes the ETL_JOB_HIST_ID and feeds it into the job about to be run. Next this job puts this ID in every row of the target table. Nice audit trail. After the job finishes this row is updated in ETL_JOB_HIST with the end time and return status.

har · Post by **har** » Mon Dec 31, 2007 9:38 am

How can i ignore partition level in dsjob -report..
Can you please post the new ETLSTATS functionality for px jobs..
And also where can i found vincent logic..
Thanks,

kduke · Post by **kduke** » Mon Dec 31, 2007 3:29 pm

Vincent logic has starts with the same name DsJobReportDb. It is not in the current EtlStats.zip. I just checked. It is simple. Just aggregate on job, stage, link and sum(row_count). You cannot ignore partitions. You need to aggregate them up to the link level.