Designing a Generic Datastage job for multiple input sources
Moderators: chulett, rschirm, roy
-
rana_s_ray
- Participant
- Posts: 6
- Joined: Fri Oct 13, 2006 4:46 am
Designing a Generic Datastage job for multiple input sources
The following requirements are there in our project:
Multiple suppliers send files at various times of the day to defined locations. The formats of the files sent by the suppliers are all different. These have to be processed and ingested ultimately into a set of tables, as and whe they arrive at the specified location.
The easy way is of course to design a job for each file and a job sequence with a waitfor activity for the arrival of the file. but this increases the development time, makes the job repetitive and creates maintenance hassles. So we are trying to design the jobs as follows:
1. Create a sequence for taking all the files from all suppliers and creating an uniform staging file, which is basically the universe of all the fields sent by the suppliers
2. Take the fields from the staging file and map it into the db tables as required.
We hit some problems in the scheduling. Basically the job we design either waits for arrival of all the files from the suppliers before it can start, or if we make the sequence start based on arrival of any of the files, then while the sequence is executing due to arrival of one file, subsequent files do not get picked up.
I am aware of scheduling DS jobs using unix scripts and all that, but don't know how. I actually want to schedule the script from the windows scheduler, so that the script can execute the generic job (with a different alias, depending on which input file it has been scheduled to processing, passing a list of arguments to be used by the job as parameter). I am looking for some advice/help on how to implement this. If anyone has sample scripts, c++ programs etc which can be used to schedule, or can direct me to the same, I will be most obliged.
Then there is the second problem of course! If there are multiple instances of the job that creates the staging file, does anything get overwritten? I mean, basically all the instances are trying to write into the same file concurrently so there are bound to be currency issues. Further the staging file has to be accessed to upload the tables, again as and when the staging file is not empty. So essentially it is a problem of how to schedule the same job(s) repeatedly, but to process different sets of data. Please advise
Multiple suppliers send files at various times of the day to defined locations. The formats of the files sent by the suppliers are all different. These have to be processed and ingested ultimately into a set of tables, as and whe they arrive at the specified location.
The easy way is of course to design a job for each file and a job sequence with a waitfor activity for the arrival of the file. but this increases the development time, makes the job repetitive and creates maintenance hassles. So we are trying to design the jobs as follows:
1. Create a sequence for taking all the files from all suppliers and creating an uniform staging file, which is basically the universe of all the fields sent by the suppliers
2. Take the fields from the staging file and map it into the db tables as required.
We hit some problems in the scheduling. Basically the job we design either waits for arrival of all the files from the suppliers before it can start, or if we make the sequence start based on arrival of any of the files, then while the sequence is executing due to arrival of one file, subsequent files do not get picked up.
I am aware of scheduling DS jobs using unix scripts and all that, but don't know how. I actually want to schedule the script from the windows scheduler, so that the script can execute the generic job (with a different alias, depending on which input file it has been scheduled to processing, passing a list of arguments to be used by the job as parameter). I am looking for some advice/help on how to implement this. If anyone has sample scripts, c++ programs etc which can be used to schedule, or can direct me to the same, I will be most obliged.
Then there is the second problem of course! If there are multiple instances of the job that creates the staging file, does anything get overwritten? I mean, basically all the instances are trying to write into the same file concurrently so there are bound to be currency issues. Further the staging file has to be accessed to upload the tables, again as and when the staging file is not empty. So essentially it is a problem of how to schedule the same job(s) repeatedly, but to process different sets of data. Please advise
Last edited by rana_s_ray on Fri Oct 13, 2006 6:32 am, edited 1 time in total.
-
rana_s_ray
- Participant
- Posts: 6
- Joined: Fri Oct 13, 2006 4:46 am
what is the desinged that you followed to create a uniform staging file?
Will seperate file stage look for each file?
If so to a touch in before job subroutine on each file before starting. So that, even if that file not present, the job can run with processing the empty file.
Will seperate file stage look for each file?
If so to a touch in before job subroutine on each file before starting. So that, even if that file not present, the job can run with processing the empty file.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
-
ray.wurlod
- Participant
- Posts: 54595
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Search the forum (or the manuals) for dsjob (the command line interface) which your Windows scheduler can use to invoke a DataStage job when arrival of a file is detected.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
First of all, the wait for file activity will fire as soon as the file gets created, even though the transfer is not complete. So you need to request the supplier to send a dummy file after the original file transfer is complete. Your wait for file activity should wait for that dummy file.
Secondly, the unix script will help you control your process. Search this forum. You will find plenty of examples.
Secondly, the unix script will help you control your process. Search this forum. You will find plenty of examples.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
-
rana_s_ray
- Participant
- Posts: 6
- Joined: Fri Oct 13, 2006 4:46 am
Thanx. We do implement wait for as you suggested. Before I posted this, I played around with wait for activity. But the issue is somewhat different. With a wait for activity, even though the sequence remains in a runnable state after the arrival of any of the dummy files defined in the wait for, once the sequence has started, any other files that have arrived in the interim are not picked up. That is why I am not looking at a wait-for-file type solution, rather an event driven solution, event being the arrival of the file. I schedule the unix script from windows scheduler to run the DS job. Could you give some pointers to a sample ds script? I would search, if I knew the keywords - am a DS novice.
