Page 1 of 1

Designing a Generic Datastage job for multiple input sources

Posted: Fri Oct 13, 2006 6:18 am
by rana_s_ray
The following requirements are there in our project:
Multiple suppliers send files at various times of the day to defined locations. The formats of the files sent by the suppliers are all different. These have to be processed and ingested ultimately into a set of tables, as and whe they arrive at the specified location.

The easy way is of course to design a job for each file and a job sequence with a waitfor activity for the arrival of the file. but this increases the development time, makes the job repetitive and creates maintenance hassles. So we are trying to design the jobs as follows:
1. Create a sequence for taking all the files from all suppliers and creating an uniform staging file, which is basically the universe of all the fields sent by the suppliers
2. Take the fields from the staging file and map it into the db tables as required.

We hit some problems in the scheduling. Basically the job we design either waits for arrival of all the files from the suppliers before it can start, or if we make the sequence start based on arrival of any of the files, then while the sequence is executing due to arrival of one file, subsequent files do not get picked up.

I am aware of scheduling DS jobs using unix scripts and all that, but don't know how. I actually want to schedule the script from the windows scheduler, so that the script can execute the generic job (with a different alias, depending on which input file it has been scheduled to processing, passing a list of arguments to be used by the job as parameter). I am looking for some advice/help on how to implement this. If anyone has sample scripts, c++ programs etc which can be used to schedule, or can direct me to the same, I will be most obliged.

Then there is the second problem of course! If there are multiple instances of the job that creates the staging file, does anything get overwritten? I mean, basically all the instances are trying to write into the same file concurrently so there are bound to be currency issues. Further the staging file has to be accessed to upload the tables, again as and when the staging file is not empty. So essentially it is a problem of how to schedule the same job(s) repeatedly, but to process different sets of data. Please advise

Posted: Fri Oct 13, 2006 6:25 am
by kumar_s
Welcome to Dsxchange!!!

You can schedule the job with a frequent interval, which actually first looks for the availability for all the files. If not quits after the check. If found, proceeds the process. :D

Posted: Fri Oct 13, 2006 6:29 am
by rana_s_ray
Hi Kumar,
Thanx for the response. However my requirement is not to wait for all the files, but to pick up any file which has arrived and process it immediately.

Posted: Fri Oct 13, 2006 6:35 am
by kumar_s
what is the desinged that you followed to create a uniform staging file?
Will seperate file stage look for each file?
If so to a touch in before job subroutine on each file before starting. So that, even if that file not present, the job can run with processing the empty file.

Posted: Fri Oct 13, 2006 7:46 am
by ray.wurlod
Search the forum (or the manuals) for dsjob (the command line interface) which your Windows scheduler can use to invoke a DataStage job when arrival of a file is detected.

Posted: Mon Oct 16, 2006 12:37 am
by rana_s_ray
Thanx, will do...

Posted: Mon Oct 16, 2006 7:02 am
by DSguru2B
First of all, the wait for file activity will fire as soon as the file gets created, even though the transfer is not complete. So you need to request the supplier to send a dummy file after the original file transfer is complete. Your wait for file activity should wait for that dummy file.
Secondly, the unix script will help you control your process. Search this forum. You will find plenty of examples.

Posted: Mon Oct 16, 2006 7:14 am
by rana_s_ray
Thanx. We do implement wait for as you suggested. Before I posted this, I played around with wait for activity. But the issue is somewhat different. With a wait for activity, even though the sequence remains in a runnable state after the arrival of any of the dummy files defined in the wait for, once the sequence has started, any other files that have arrived in the interim are not picked up. That is why I am not looking at a wait-for-file type solution, rather an event driven solution, event being the arrival of the file. I schedule the unix script from windows scheduler to run the DS job. Could you give some pointers to a sample ds script? I would search, if I knew the keywords - am a DS novice.

Posted: Mon Oct 16, 2006 7:41 am
by DSguru2B
Do an exact search for unix script. You will get a few examples.
Heres one