Sequential File stage need to select latest File to Extract

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
vbeeram
Participant
Posts: 63
Joined: Fri Apr 09, 2004 9:40 pm
Contact:

Sequential File stage need to select latest File to Extract

Post by vbeeram »

Hi,

My job has to run everyday to extract data from a Flat File(files will be on the UNIX box) .

There should be multiple files,but my sequentila file has to pick latest one.
Files are named with some text + timestamp.
All the files will have name and will be differentiated with timestamp.

EX:
1) WalmartProduct10182005...... .txt
2)WalmartProduct10172005...... .txt
Here Sequential file has to pick first job based on timestamp(latest)


But how Sequential fils can identify the latest file?

Any ideas?

Thanks in advance
Beeram
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Run a UNIX command (in an Execute Command activity) to determine the latest. For example

Code: Select all

ls -t1 | head -1
(your ls may be a little different).

Use the output from this to supply a job parameter with the file name.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Post by rleishman »

Beeram,

I think you will have to do it with a Unix command. You can call the Unix command from a before-job-subroutine, or from a Shell Exec activity that precedes the Job Activity in a Job Sequence.

Two options for the Unix command are:
1. Find the lastest file and move it to a static file name.
2. Find the latest file and link it to a static file name.

One possible implementation of option 2 is:

Code: Select all

ln -fs `/bin/ls -tr WalmartProduct*.txt | head -1` WalmartProduct.txt 
Edited: ... or you could do what Ray said... :)
Ross Leishman
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Hi,
But i guess the funda is to know how to get the latest file name.
It can be some what like finding the last date of the specified month and subtractig the date one by one until a file is found on that date.

Regards
kumar
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Either of the above ls commands will give you the latest, whatever it is. Learn some UNIX. The -t option for ls sorts by date/time (by default date/time modified, but you can change the default with -c or -u options).

If you want to filter further on date/time, you could use the find command.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
apraman
Participant
Posts: 47
Joined: Mon Sep 12, 2005 5:26 am

Post by apraman »

I think his requirement is to find the latest file based on the timestamp included with in the filename. :)

If the 'filename' consists the timestamp with the format YYYYMMDDHHMMSS, would have been easy, considering the first part ie alphabetic part must remain same.

Code: Select all

ls -r WalmartProduct*.txt | head -1
Need a Unix Guru to help to get latest filename consisting timestamp of fromat MMDDYYYYHHMMSS

Any help?
Arun
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Just add the timestamp into the wildcard.

Code: Select all

ls -t1 *20051018134500*.txt
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
clshore
Charter Member
Charter Member
Posts: 115
Joined: Tue Oct 21, 2003 11:45 am

Post by clshore »

You can use the ls -1t method, but some folks have the unfortunate habit when viewing files, of writing a new version (shift ZZ, you know who you are), which resets the UNIX file timestamp.

You will be lucky to find out about one of these; more often the wrong file is silently processed, the details buried in a log somewhere.

I have found it safer to use the date/time in the filename created by the source process. It takes a more deliberate action to alter the filename.

If your filenames are consistently created with names like this:
WalmartProduct10182005...... .txt

you could use something like this:

ls -1 WalmartProduct*.txt |
sort -rn -k 1.19,1.22 -k 1.15,1.18 |
head -1

to get the most recent one by the date embedded in the name.

Carter
vbeeram
Participant
Posts: 63
Joined: Fri Apr 09, 2004 9:40 pm
Contact:

Post by vbeeram »

My DataStage server is on NT and Files are on UNIX Box,
So these files are external sources to Datastage Server.

In this situation can i pass UNIX Command output to job parameter?




Thanks
Beeram
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

My DataStage server is on NT and Files are on UNIX Box,
So these files are external sources to Datastage Server.
Ah, new information not in original post. :roll:

(Notes that original post specified UNIX as the platform.)

Yes, you can pass values to job parameters from job control code. For example a job sequence that includes a loop (StartLoop and EndLoop activities). Or roll-your-own job control code for versions earlier than 7.5.

It is necessary to execute the ls command on the UNIX machine, using some form of remote shell.

It is also necessary that the DataStage jobs can "see" the UNIX files - presumably you have samba or something similar in place to facilitate this.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply