Page 1 of 1
Job output parameters
Posted: Wed Jul 07, 2010 4:01 am
by ScubaSteve
I am new to datastage development having spent about 10 years working in various other ETL tools. I am finding working in Datastage rather heavy going. Anyway my scenario is the following.
I need to build a job that will genertate an output file from multiple input files. Before these input files can be used they need to be checked for completeness by comparing hash totals in the trailer record with calculated values off the detail records.
I have built parallel job's to do the file validation as this was the only job type that had enough functionality to do the work.
The job generate the extract file would then run after all the validation jobs ran successfully.
I need the validation job to provide output that i can use in a control or sequence job to make decisions.
Dont know how to do this. Searching the forums delivered advice such as using UserStatus but others state that this cant be used in parallel jobs.
Can anyone provide a definitive answer on whether values can be outputted from a parallel job.
I need to know if I have to go with an interim file or db based storage workaround.
Posted: Wed Jul 07, 2010 4:15 am
by ArndW
Welcome to DSXchange, Steve.
What would be an example of your validation job's return value? Is it a simple Yes/No, a string of some type or some other structure? Depending upon how this value is computed you can use the job status or user status to return information to the calling sequence; or you might best be served by using some other method (creating a text file, perhaps).
It sounds like a simple Yes/No validation result would be sufficient, in which case the solution would be easy.
Posted: Wed Jul 07, 2010 4:19 am
by ScubaSteve
Yes\No validation would be fine, although it would be nice to get the rowcount recurned as well, but i can do without that.
Posted: Wed Jul 07, 2010 5:22 am
by ArndW
I would start with your PX job - how does it determine validation? One very simple method that you can use from your calling Sequence job is to call DSGetLinkInfo() to get the number of rows that went down a named link - you can use that number for your row count and if the number is 0 then your validation "failed".
Posted: Wed Jul 07, 2010 5:50 am
by ScubaSteve
The Px Job to validate the file content does the following:
It has a CFF stage that reads the file and streams the data to a Transform function that does some type conversion and then routes the records down one of two paths. All detail records are routed to a copy stage that copies the fields to 2 record aggregator stages. The one does a record count and the other does a hashtotal count on one of the detail record fields.
The trailer record is the plumbed into a merge stage together with the output from the 2 aggregator stages to combine the data into a single set of values - basically Trailer Rec Count + Hash Total and Calculated Rec Count and Hash Total.
This is then passed into a transformer component that compares the trailer values with corresponding calculated values and outputs a zero or 1 depending on this match.
Currently this output is fed into a peek - I would like to have this value available when the job is called in a sequence.
DSGetLinkInfo() will be useful in getting the record count. How do you actually use DSGetLinkInfo() - is it something you configure in a job activity stage in a job sequence, is it something i can access in a job using one of the stage types like basic transformer.
Posted: Wed Jul 07, 2010 6:27 am
by ArndW
In your case you could keep the job unchanged and just add to your job sequence. I tried an example using UserVariablesActivity at V8 and got an error message on DSAttachJob() so suggest you create a small 3-liner server routine:
GetJobRowCount(JobName,StageName,LinkName)
Code: Select all
JobHandle = DSAttachJob(JobName,DSJ.ERRNONE)
Ans = DSGetLinkInfo(JobHandle,StageName,LinkName,DSJ.LINKROWCOUNT)
Dummy = DSDetachJob(JobHandle)
You can call that up from a job sequence Routine Activity stage and pass in the job name, the stage name (you can use either in or out stage) and link name and it will return the number of rows that went through that link the last time the job ran.
You can call it up once to see if any rows went to the PEEK stage (assuming 0 rows means that there was a validation problem) and then again to get the number of rows in the source (perhaps subtract 2 for header/trailer).
Posted: Wed Jul 07, 2010 7:16 am
by ScubaSteve
This looks like the answer to my dilemna if i can get the routine set up and working.
Can you give me some pointers on how to create and use a server routine. What I am doing is not working.
I created a new server routine and set the routine name to GetJobRowCount
I left the type as Transform Function
I added the three arguments (JobName, StageName and LinkName)
In the code window i pasted the code you listed
I can save this, but when i try and compile it i get the following error:
Compiling: Source = 'DSU_BP/DSU.GetJobRowCount', Object = 'DSU_BP.O/DSU.GetJobRowCount'
0003 Ans = DSGetLinkInfo(JobHandle,StageName,LinkName,DSJ.LINKROWCOUNT)
^
',' unexpected, Was expecting: '!', ')', '=', "AND", "OR", "LT", "LE",
"GT", "GE", "NE", "EQ", "MATCH"
Array 'DSAttachJob' never dimensioned.
WARNING: Variable 'DSJ.ERRNONE' never assigned a value.
Array 'DSGetLinkInfo' never dimensioned.
Array 'DSDetachJob' never dimensioned.
4 Errors detected, No Object Code Produced.
It looks a bit like a library refernce missing error i would get in a development environment.
Posted: Wed Jul 07, 2010 7:19 am
by chulett
Code: Select all
$IFNDEF JOBCONTROL.H
$INCLUDE DSINCLUDE JOBCONTROL.H
$ENDIF
Posted: Wed Jul 07, 2010 8:11 am
by ArndW
I keep on forgetting that INCLUDE in user-written routines, thanks for catching that.
Posted: Wed Jul 07, 2010 11:45 pm
by ScubaSteve
Thanks "ArndW" and "chulett". I got it to work now.
Mention was made of UserStatus - how do i go about accessing and setting this. I assume I can access the values in UserStatus by doing a similar routine as for the job info, but how do I go about populating values into UserStatus from within a Px process.
Once again many thanks
Posted: Thu Jul 08, 2010 12:17 am
by ray.wurlod
Alas, it is not easy in a parallel job, since you don't have direct access to the DataStage API - and, I believe, the C-callable API does not include the DSSetUserStatus() function in any case.