Reading and writing from the same file?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Reading and writing from the same file?

Post by splayer »

Can it be done? Does it have to be a sequential file or data set file?
OttMAdpttch
Charter Member
Charter Member
Posts: 6
Joined: Thu Mar 27, 2003 1:55 pm
Contact:

Post by OttMAdpttch »

In either version of Datastage (Server or EE), you cannot open the same sequential or dataset file for both read and write. The only file type that you can do this with is a hash data file.
Mark Ott
DataStage Architect
Adept Technologies, Inc.
NBALA
Participant
Posts: 48
Joined: Tue Jul 11, 2006 11:52 am
Location: IL, USA

Post by NBALA »

This is possible if two different jobs are used, but not at the same time.

Could you explain the situation?

-NB
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

Yes. I am using 2 jobs. In the first one, I am initializing a value in the file in UNIX to 0. This is called only once. I have another job which increments it by one. This job will be called in the beginning of the sequencer. For this job, I need to read the value in the file, increase it by 1 in a transformer, and write it back into the same file. However, this does not work for sequential files and data set files. I am currently trying a server job using a hash file.

Thanks for your help.
DarioE
Participant
Posts: 4
Joined: Thu Sep 14, 2006 9:56 am

Re: Reading and writing from the same file?

Post by DarioE »

splayer wrote:Can it be done? Does it have to be a sequential file or data set file?
If you're talking about the hash files, sure. You just cannot have two links from the same hash stage.
What you have to do is take two hash stages (for the same hash file), do not turn on cashing, nor for writing neither for reading and it works just fine.

Hope this helps
Dario
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

If you read and write to the same hashed file then there is a possibility you can process the same record more than once. Records are stored randomly in a hashed file so you can read a record and write it back to somewhere deeper into the file. This is especially true if read with one record id and write to a different one. You can get unexplained results. This is not a good practice.
Mamu Kim
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

In parallel jobs it is explicitly forbidden to "block" operations.

In server jobs you can write data into a sequential file then read from that file, but the read operation does not begin until the write operation completes. (The assertion made by OttMAdpttch is not true in this regard.) This is an example of a "blocking" operation.

What would it actually mean in a parallel environment???

Hashed files are not available in parallel jobs, so ignore any advice relating to these.

If you do want to write to and read from the same file in parallel jobs, you must do it using two concurrent jobs.

If you are looking for "near real time update" you will need to investigate using database tables with auto-commit. Even then, you will need to disable other parallelism features such as buffering, so as to be sure that your update is in place before the next row attempts a lookup.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

It the updated/Inserted record would not required to be updated/refered 2nd time in the current run, then read one file and update another file with the same content, and copy the updated file to the original file after the job run.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

Thank you ray, kumar_s, kduke and DarioE for your responses.

I could do this in server jobs if needed but I just need this working. I tried using:

SeqFile ----> Xfm -----> SeqFile

However, it does not work. I tried using hash files with a KeyCol and ValueCol as follows:

This is the scenario I have:

SeqFileStage ----> Transformer -----> HashFileStage

The SeqFileStage is there just to prevent compiler errors. It has no role. In the transformer, I look up the hash file as:
UtilityHashLookup("/MyPath/JCF_Hash", 1, 1)

1: KeyValue
1: Value position. Neither 1 nor 2 works.

Here is my hashfile: (starting values)

KeyCol ValueCol
1 0

After updating, ValueCol should be 1. Key column in the hash file is KeyCol. This adds another row to the hashfile instead of replacing the ValueCol with a new value.

I tried several different scenarios so far. Please let me know if I need to be more clear. Thanks.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You need to be more clear. Use a server job, since it must operate in sequential mode in any case.

Prove to us what's in the hashed file. From the Administrator client, execute the command

Code: Select all

LIST.ITEM HashedFileName 'KeyValue'
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

If you are ready to use Hahsed file, you can read the hashed file using Hashfile sage, and find the max, use transformer and file stage to replace with the value in the target hashed file. Since it has advantage of, being able to update and read simultenoulsy.
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

ray, I tried this:
LIST.ITEM /devetl/ptqs/IDMAR/Data/JCF_Hash 1

I can view my hash file through the hash file stage. There are the 2 columns I have, KeyCol and ValueCol. I have only 1 row. The values are: 1 and 0.

What I want to do id, I want to replace the value of 0 by the next value of 1 and so on. There should never be more than 1 row in the hash file. So in my job, I want to read from the hash file or in a transformer stage, do a lookup, increment it by 1 and write to back to the SAME hash file.

Thank you very much for your responses. Please let me know if I can provide any more information.
splayer
Charter Member
Charter Member
Posts: 502
Joined: Mon Apr 12, 2004 5:01 pm

Post by splayer »

After I try the LIST.ITEM, I get the following error:

Retrieve: syntax error. Unexpected sentence without filename. Token was "". Scanned command was LIST.ITEM '/devetl/ptqs/IDMAR/Data/JCF_Hash' '1'

I also tried:

LIST.ITEM "/devetl/ptqs/IDMAR/Data/JCF_Hash" "1" and
LIST.ITEM "/devetl/ptqs/IDMAR/Data/JCF_Hash" KeyCol

Same error.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

LIST.ITEM requires a VOC pointer. A VOC pointer can be created with a SETFILE command. Search the forum for details.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply