Page 1 of 1

Datastage Audit Job

Posted: Thu Sep 06, 2018 10:36 am
by Sumith
Hi All,

I have a requirement of this. We extract data from DB2/ODBC and load data into table/file. We have to add audit details such as source record count,target record count and differences to the audit tables on a daily basis.Can you please suggest me on this?

Posted: Thu Sep 06, 2018 1:27 pm
by chulett
Hmmm... this is not a topic for a short conversation. But let's see if we can lay some groundwork here at least.

About the only thing right now that I'd ask you to clarify is your "on a daily basis" comment. Does this mean you need to target something that runs at the end of your work day to gather all of this information up (from somewhere as of yet to be determined) and thus insert them into your audit table all at once - or are you looking for a mechanism where your audit table is populated during each job run or possibly via a tightly-coupled after job type process? Meaning over and over again over the course of the day.

Or are we too early in the design stage to even know yet? :wink:

Datastage Audit Job

Posted: Thu Sep 06, 2018 2:44 pm
by Sumith
Thanks for your response.

In our project, we extract data from 3 different systems, source being DB2 and flat files. After loading the data, we have to enter the source and target count as below into a file.

SYS_NAME, Current_Date, Source Count, Target Count, Difference

Posted: Thu Sep 06, 2018 5:32 pm
by ray.wurlod
For a do-it-yourself approach you could create something using the DataStage API, either as a DataStage BASIC routine or as an external command. The API has functions for obtaining link row counts - you should investigate these. A systematic approach to naming links in the job will help markedly.

Another approach would be to implement Kim Duke's ETLStats package, which you can download from here. Follow the ETL STATS link. Let me know if you need access granted. This package can capture row counts for every job in your project.

Posted: Fri Sep 07, 2018 5:44 am
by chulett
Does not the DSODB / Administration Console provide the majority of that functionality nowadays?

Posted: Sat Sep 08, 2018 7:00 pm
by ray.wurlod
Only if you run jobs in a fashion that generate operational metadata. And that's not appropriate for all jobs.

Posted: Sun Sep 09, 2018 7:19 pm
by rumu
Hi All,

I have done something similar for our daily batch jobs. We call it Load_Information table.
A static table is maintained with Job name, source table and target table name.
A generic unix script is created to capture the audit information which is being invokes before/after job subroutine. The script reads the static table to look for source and target tables name ,connect to DB to get the counts and update the Load Information table.
I new approach is being considered lately to run this script only after the end of the batch job to capture the information. In this case, the script will loop through the static table. We are not capturing the differences as this is only possible if source to target is 1:1.
To check source minus target, we are using a separate script that uses minus logic between source query and target query.