Looping logic based on multiple conditions to end loop

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Chandrathdsx
Participant
Posts: 59
Joined: Sat Jul 05, 2008 11:32 am

Looping logic based on multiple conditions to end loop

Post by Chandrathdsx »

I have a requirement to run a job in loop to read files (data-files) in file direcory and process them to load into a table.
The files keep coming for 2hr as many as few thousands. but, once all the files are landed in the directory, there will be antoher file (control-file) that states that all the files are landed. The ETL job to be set-up to start after first set of data files are landed in the file directory and it has to loop through until no more data files exist and the control file exist.


I highly appreaciate any help to be able to code loop for this? I am not sure if this can be achieved thru the numeric loop as I know when to end the loop (two conditions, control-file exists and no more data files exist). But, totally out of ideas on how to do this.

Thank you!
Chen That
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You end a loop early simply by branching out of it past the End Loop stage. Easiest way to do that is park a Sequencer set to 'Any' right after it and branch to that based on the filename found.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Chandrathdsx
Participant
Posts: 59
Joined: Sat Jul 05, 2008 11:32 am

Post by Chandrathdsx »

chulett wrote:You end a loop early simply by branching out of it past the End Loop stage. Easiest way to do that is park a Sequencer set to 'Any' right after it and branch to that based on the filename found. ...
Craig,
Thank you for the reply!
I can only end the loop if the control file is found and no more data files to process. Could you please eloborate on the job design steps and what needs to be given in start loop for step and end values..

Thanks again, appreciate all your help.
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

you don't need to change your start loop or end loop activity. Just passing control to a activity which is not part of the loop will do the trick.

Or not passing the control forward at all. (Not a very good idea i think)
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What would need to happen if you ran out of data files and you hadn't seen this control file yet? Keep looping and doing nothing, assuming that more files will be coming? If so, that's going to severely complicate the job design for you.

Why not wait for the control file to arrive using the Wait For File stage (or something similar) and then process whatever files you have at that point?
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'm going to err on the side of complication and assume it needs to motor along doing something (even if that something is nothing) until the magic file shows up. And because I'm pressed for time this morning, this is a freebie high-level overview of what I would do in your shoes.

Tools you'll need:

1. A Looping Sequence job
2. A shell script that can find the files you need to process in the proper order
3. A Phillips head screwdriver

The script will need to return a list of files from your landing directory, oldest to newest, one file per line without a trailing separator. I assume that will make your control file be last in the list. So the job would look something like this in the order listed:

1. Start Loop. Use a numeric loop with the number set as high as it will go

2. Execute Command to run the script. It will return a dynamic array, take only the first element

3. No file found (element empty)?
.3a. Sleep for a period of time
.3b. Branch to the End Loop stage

4. Control file found?
.4a. Branch outside the End Loop stage, you done!

5. Otherwise (regular file found)
.5a. Job Activity to process file
.5b. Branch to the End Loop stage

Extra credit
Capture the start time of the job and add a check for "it has been running too dang long, something must be wrong". Branch outside the loop and raise an error / send an email when that happens.

There's always more than one way to do things, so feel free to experiment.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Chandrathdsx
Participant
Posts: 59
Joined: Sat Jul 05, 2008 11:32 am

Post by Chandrathdsx »

Thank you every one for all your valuable details.

My data files keep coming during 2 hrs span. I do not want to wait for 2hrs to start job to process all files at once as all files together will be around 50gb+. I want to leverage 2hrs time to process data in loop as and when files are keep coming. And my control file (magic file) is the last one to come, so as soon as I see the control file I can end the loop after making sure that all my data files (that were landed already) are successfully processed.

Thanks again. I will experiment on the solution and will reply on how it went.
dganeshm
Premium Member
Premium Member
Posts: 91
Joined: Tue Aug 11, 2009 3:26 pm

Post by dganeshm »

Have a shell script and a multi instance job, read the file name and invoke a instance of the job using dsjob command once your shell script reads the control file ..exit script...
Regards,
Ganesh
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

What is the mode of transfer for these files to land in your sourcing directory? If ftp, then you also need to make sure that you do not start reading the files before its completely written.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Very true, all of the standard rules for processing files being transferred into your processing system in 'real time' (rather than batched) should come into play here.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Chandrathdsx
Participant
Posts: 59
Joined: Sat Jul 05, 2008 11:32 am

Post by Chandrathdsx »

DSguru2B wrote:What is the mode of transfer for these files to land in your sourcing directory? If ftp, then you also need to make sure that you do not start reading the files before its completely written.
DSguru2B,
We already taken care of this. APAP program writes these files into a Directory and as soon as it is done with writting a file it moves the file to a staging directory where from datastage reads. We have taken care of write and read using different directories.

Ganesh,
Multiinstances feature may not help much here as it complicates managing the files, as I am not looking to process one file at a time as there will be thousands of files. Instead I am planning to process the group of files that are available when I start the job and it reads the the set files (that were ready to be processed) and move to a 'processed directory' once processed into target. Meanwhile the datastage job processes the first set of files, 2nd set of files will be coming into the to be processed directory. Same datastage job runs to process next set of 'ready to be processed' files in a loop. If I use multiinstances, will need to group files with a tag to mention instance id, so that each instance will deal with corresponding set of files. Is that you mean to propose? Please suggest if there is a better way to leverage multi instance.

Thank you all again for your help.
Chandrathdsx
Participant
Posts: 59
Joined: Sat Jul 05, 2008 11:32 am

Post by Chandrathdsx »

Chandrathdsx wrote:
Thanks again. I will experiment on the solution and will reply on how it went.
Craig,
The solution works! Thanks for all your help.

Ganesh,
I am marking the topic as resolved. But, feel free to eloborate if you have ideas about my comments on Multi-Instances. I am thinking to enable this to speed-up the process, but do not want to too complicate the files processing.
Post Reply