Designing a Generic Datastage job for multiple input sources

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rana_s_ray
Participant
Posts: 6
Joined: Fri Oct 13, 2006 4:46 am

Designing a Generic Datastage job for multiple input sources

Post by rana_s_ray »

The following requirements are there in our project:
Multiple suppliers send files at various times of the day to defined locations. The formats of the files sent by the suppliers are all different. These have to be processed and ingested ultimately into a set of tables, as and whe they arrive at the specified location.

The easy way is of course to design a job for each file and a job sequence with a waitfor activity for the arrival of the file. but this increases the development time, makes the job repetitive and creates maintenance hassles. So we are trying to design the jobs as follows:
1. Create a sequence for taking all the files from all suppliers and creating an uniform staging file, which is basically the universe of all the fields sent by the suppliers
2. Take the fields from the staging file and map it into the db tables as required.

We hit some problems in the scheduling. Basically the job we design either waits for arrival of all the files from the suppliers before it can start, or if we make the sequence start based on arrival of any of the files, then while the sequence is executing due to arrival of one file, subsequent files do not get picked up.

I am aware of scheduling DS jobs using unix scripts and all that, but don't know how. I actually want to schedule the script from the windows scheduler, so that the script can execute the generic job (with a different alias, depending on which input file it has been scheduled to processing, passing a list of arguments to be used by the job as parameter). I am looking for some advice/help on how to implement this. If anyone has sample scripts, c++ programs etc which can be used to schedule, or can direct me to the same, I will be most obliged.

Then there is the second problem of course! If there are multiple instances of the job that creates the staging file, does anything get overwritten? I mean, basically all the instances are trying to write into the same file concurrently so there are bound to be currency issues. Further the staging file has to be accessed to upload the tables, again as and when the staging file is not empty. So essentially it is a problem of how to schedule the same job(s) repeatedly, but to process different sets of data. Please advise
Last edited by rana_s_ray on Fri Oct 13, 2006 6:32 am, edited 1 time in total.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Welcome to Dsxchange!!!

You can schedule the job with a frequent interval, which actually first looks for the availability for all the files. If not quits after the check. If found, proceeds the process. :D
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
rana_s_ray
Participant
Posts: 6
Joined: Fri Oct 13, 2006 4:46 am

Post by rana_s_ray »

Hi Kumar,
Thanx for the response. However my requirement is not to wait for all the files, but to pick up any file which has arrived and process it immediately.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

what is the desinged that you followed to create a uniform staging file?
Will seperate file stage look for each file?
If so to a touch in before job subroutine on each file before starting. So that, even if that file not present, the job can run with processing the empty file.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Search the forum (or the manuals) for dsjob (the command line interface) which your Windows scheduler can use to invoke a DataStage job when arrival of a file is detected.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rana_s_ray
Participant
Posts: 6
Joined: Fri Oct 13, 2006 4:46 am

Post by rana_s_ray »

Thanx, will do...
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

First of all, the wait for file activity will fire as soon as the file gets created, even though the transfer is not complete. So you need to request the supplier to send a dummy file after the original file transfer is complete. Your wait for file activity should wait for that dummy file.
Secondly, the unix script will help you control your process. Search this forum. You will find plenty of examples.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
rana_s_ray
Participant
Posts: 6
Joined: Fri Oct 13, 2006 4:46 am

Post by rana_s_ray »

Thanx. We do implement wait for as you suggested. Before I posted this, I played around with wait for activity. But the issue is somewhat different. With a wait for activity, even though the sequence remains in a runnable state after the arrival of any of the dummy files defined in the wait for, once the sequence has started, any other files that have arrived in the interim are not picked up. That is why I am not looking at a wait-for-file type solution, rather an event driven solution, event being the arrival of the file. I schedule the unix script from windows scheduler to run the DS job. Could you give some pointers to a sample ds script? I would search, if I knew the keywords - am a DS novice.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Do an exact search for unix script. You will get a few examples.
Heres one
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Post Reply