Sequential File stage need to select latest File to Extract
Moderators: chulett, rschirm, roy
Sequential File stage need to select latest File to Extract
Hi,
My job has to run everyday to extract data from a Flat File(files will be on the UNIX box) .
There should be multiple files,but my sequentila file has to pick latest one.
Files are named with some text + timestamp.
All the files will have name and will be differentiated with timestamp.
EX:
1) WalmartProduct10182005...... .txt
2)WalmartProduct10172005...... .txt
Here Sequential file has to pick first job based on timestamp(latest)
But how Sequential fils can identify the latest file?
Any ideas?
Thanks in advance
Beeram
My job has to run everyday to extract data from a Flat File(files will be on the UNIX box) .
There should be multiple files,but my sequentila file has to pick latest one.
Files are named with some text + timestamp.
All the files will have name and will be differentiated with timestamp.
EX:
1) WalmartProduct10182005...... .txt
2)WalmartProduct10172005...... .txt
Here Sequential file has to pick first job based on timestamp(latest)
But how Sequential fils can identify the latest file?
Any ideas?
Thanks in advance
Beeram
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Run a UNIX command (in an Execute Command activity) to determine the latest. For example (your ls may be a little different).
Use the output from this to supply a job parameter with the file name.
Code: Select all
ls -t1 | head -1
Use the output from this to supply a job parameter with the file name.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Premium Member
- Posts: 252
- Joined: Mon Sep 19, 2005 10:28 pm
- Location: Melbourne, Australia
- Contact:
Beeram,
I think you will have to do it with a Unix command. You can call the Unix command from a before-job-subroutine, or from a Shell Exec activity that precedes the Job Activity in a Job Sequence.
Two options for the Unix command are:
1. Find the lastest file and move it to a static file name.
2. Find the latest file and link it to a static file name.
One possible implementation of option 2 is:
Edited: ... or you could do what Ray said...
I think you will have to do it with a Unix command. You can call the Unix command from a before-job-subroutine, or from a Shell Exec activity that precedes the Job Activity in a Job Sequence.
Two options for the Unix command are:
1. Find the lastest file and move it to a static file name.
2. Find the latest file and link it to a static file name.
One possible implementation of option 2 is:
Code: Select all
ln -fs `/bin/ls -tr WalmartProduct*.txt | head -1` WalmartProduct.txt
Ross Leishman
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Either of the above ls commands will give you the latest, whatever it is. Learn some UNIX. The -t option for ls sorts by date/time (by default date/time modified, but you can change the default with -c or -u options).
If you want to filter further on date/time, you could use the find command.
If you want to filter further on date/time, you could use the find command.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I think his requirement is to find the latest file based on the timestamp included with in the filename.
If the 'filename' consists the timestamp with the format YYYYMMDDHHMMSS, would have been easy, considering the first part ie alphabetic part must remain same.
Need a Unix Guru to help to get latest filename consisting timestamp of fromat MMDDYYYYHHMMSS
Any help?
If the 'filename' consists the timestamp with the format YYYYMMDDHHMMSS, would have been easy, considering the first part ie alphabetic part must remain same.
Code: Select all
ls -r WalmartProduct*.txt | head -1
Any help?
Arun
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Just add the timestamp into the wildcard.
Code: Select all
ls -t1 *20051018134500*.txt
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
You can use the ls -1t method, but some folks have the unfortunate habit when viewing files, of writing a new version (shift ZZ, you know who you are), which resets the UNIX file timestamp.
You will be lucky to find out about one of these; more often the wrong file is silently processed, the details buried in a log somewhere.
I have found it safer to use the date/time in the filename created by the source process. It takes a more deliberate action to alter the filename.
If your filenames are consistently created with names like this:
WalmartProduct10182005...... .txt
you could use something like this:
ls -1 WalmartProduct*.txt |
sort -rn -k 1.19,1.22 -k 1.15,1.18 |
head -1
to get the most recent one by the date embedded in the name.
Carter
You will be lucky to find out about one of these; more often the wrong file is silently processed, the details buried in a log somewhere.
I have found it safer to use the date/time in the filename created by the source process. It takes a more deliberate action to alter the filename.
If your filenames are consistently created with names like this:
WalmartProduct10182005...... .txt
you could use something like this:
ls -1 WalmartProduct*.txt |
sort -rn -k 1.19,1.22 -k 1.15,1.18 |
head -1
to get the most recent one by the date embedded in the name.
Carter
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Ah, new information not in original post.My DataStage server is on NT and Files are on UNIX Box,
So these files are external sources to Datastage Server.
(Notes that original post specified UNIX as the platform.)
Yes, you can pass values to job parameters from job control code. For example a job sequence that includes a loop (StartLoop and EndLoop activities). Or roll-your-own job control code for versions earlier than 7.5.
It is necessary to execute the ls command on the UNIX machine, using some form of remote shell.
It is also necessary that the DataStage jobs can "see" the UNIX files - presumably you have samba or something similar in place to facilitate this.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.