Source and Target row count matching in Datastage job

dsgguy · Post by **dsgguy** » Wed Jun 04, 2014 2:56 pm

Hi

I'm trying to find a way to match the row count from source and tsrget within the datastage job. Is it possible? If so, could you please suggest the solution. If not, what would be my options? Do I need to write something outside the datastage job? Please help.

Thanks in advance
Sri

ray.wurlod · Post by **ray.wurlod** » Wed Jun 04, 2014 3:23 pm

Welcome aboard.

You could create an after-job subroutine that uses DataStage API functions to retrieve the link row counts.

What do you want to do with these counts?

dsgguy · Post by **dsgguy** » Wed Jun 04, 2014 3:30 pm

Thanks Ray.

I want the counts to make sure that all the source records have been loaded to target with out any issues...

ray.wurlod · Post by **ray.wurlod** » Wed Jun 04, 2014 3:32 pm

That's the why. What do you actually want to do with them? Write to file? Send email? Something else?

dsgguy · Post by **dsgguy** » Wed Jun 04, 2014 3:50 pm

Ray - I Want to send an email if the counts are not matching.

ray.wurlod · Post by **ray.wurlod** » Wed Jun 04, 2014 4:32 pm

OK, so you need the DataStage API function DSGetLinkRowCount() and DSSendMail().
You can find examples of the latter under \Routines\Built-In\Utilities in Designer.

chulett · Post by **chulett** » Wed Jun 04, 2014 9:49 pm

... and this is typically something done as a separate step outside of the job after it completes, not part of the job itself.

dsgguy · Post by **dsgguy** » Thu Jun 05, 2014 12:48 am

Thanks Ray and Craig.
Apart from using API functions in after job subroutine, can we do something within the datastage job itself?
Can we have a job parameter as an output parameter? In other words, can we run some query in 'before job SQL' and assign the result to a job parameter and use that in the later part of datastage job? As per my knowledge, we don't have that option, but just want to know if something came up in 9.1?

askvishal · Post by **askvishal** » Thu Jun 05, 2014 1:22 am

In my view, you can name the Source and Target links with specified keywords. Say, READ and INSERT respectively. Write a Procedure to lookout for links with the defined keywords and call this procedure in a sequencer that would trigger a mail.

Ray, Is this a right approach??

ray.wurlod · Post by **ray.wurlod** » Thu Jun 05, 2014 3:24 am

It's "a" right approach, but not the only one. For example, one might pass the link names as arguments to a generic subroutine.

chulett · Post by **chulett** » Thu Jun 05, 2014 7:07 am

dsgguy wrote:Can we have a job parameter as an output parameter?

For completeness - as answered elsewhere, no.

chulett · Post by **chulett** » Thu Jun 05, 2014 7:16 am

dsgguy wrote:Apart from using API functions in after job subroutine, can we do something within the datastage job itself?

Perhaps. However, I would strongly discourage any such approach. I'm not a big fan of extra bits you'd have to incorporate into every job, parts that would on failure jeopardize the load / process itself and cause you to repeat the whole dang thing simply to get said metrics or the check run.

From what I have seen (and set up) it is much more common to have a more 'generic' process, something that comes along after a successful nightly load and does the check. It can be driven by control tables - which jobs need to be checked, what link names to use, etc - or can be more dynamic if as noted earlier a standard set of link names or prefixes have been used that can be checked for and captured when found. The 'after' part is significant in my mind as you are not impacting the load itself and failures in the checking process are much less urgent than if they fail the load as well. And it can be run anytime after process completion before the next iteration of the process.

My two point four cents.