Page 1 of 1

Reading and writing from the same file?

Posted: Fri Sep 15, 2006 9:39 am
by splayer
Can it be done? Does it have to be a sequential file or data set file?

Posted: Fri Sep 15, 2006 10:01 am
by OttMAdpttch
In either version of Datastage (Server or EE), you cannot open the same sequential or dataset file for both read and write. The only file type that you can do this with is a hash data file.

Posted: Fri Sep 15, 2006 10:50 am
by NBALA
This is possible if two different jobs are used, but not at the same time.

Could you explain the situation?

-NB

Posted: Fri Sep 15, 2006 11:28 am
by splayer
Yes. I am using 2 jobs. In the first one, I am initializing a value in the file in UNIX to 0. This is called only once. I have another job which increments it by one. This job will be called in the beginning of the sequencer. For this job, I need to read the value in the file, increase it by 1 in a transformer, and write it back into the same file. However, this does not work for sequential files and data set files. I am currently trying a server job using a hash file.

Thanks for your help.

Re: Reading and writing from the same file?

Posted: Fri Sep 15, 2006 2:07 pm
by DarioE
splayer wrote:Can it be done? Does it have to be a sequential file or data set file?
If you're talking about the hash files, sure. You just cannot have two links from the same hash stage.
What you have to do is take two hash stages (for the same hash file), do not turn on cashing, nor for writing neither for reading and it works just fine.

Hope this helps
Dario

Posted: Fri Sep 15, 2006 3:42 pm
by kduke
If you read and write to the same hashed file then there is a possibility you can process the same record more than once. Records are stored randomly in a hashed file so you can read a record and write it back to somewhere deeper into the file. This is especially true if read with one record id and write to a different one. You can get unexplained results. This is not a good practice.

Posted: Fri Sep 15, 2006 5:31 pm
by ray.wurlod
In parallel jobs it is explicitly forbidden to "block" operations.

In server jobs you can write data into a sequential file then read from that file, but the read operation does not begin until the write operation completes. (The assertion made by OttMAdpttch is not true in this regard.) This is an example of a "blocking" operation.

What would it actually mean in a parallel environment???

Hashed files are not available in parallel jobs, so ignore any advice relating to these.

If you do want to write to and read from the same file in parallel jobs, you must do it using two concurrent jobs.

If you are looking for "near real time update" you will need to investigate using database tables with auto-commit. Even then, you will need to disable other parallelism features such as buffering, so as to be sure that your update is in place before the next row attempts a lookup.

Posted: Fri Sep 15, 2006 8:42 pm
by kumar_s
It the updated/Inserted record would not required to be updated/refered 2nd time in the current run, then read one file and update another file with the same content, and copy the updated file to the original file after the job run.

Posted: Sat Sep 16, 2006 11:48 am
by splayer
Thank you ray, kumar_s, kduke and DarioE for your responses.

I could do this in server jobs if needed but I just need this working. I tried using:

SeqFile ----> Xfm -----> SeqFile

However, it does not work. I tried using hash files with a KeyCol and ValueCol as follows:

This is the scenario I have:

SeqFileStage ----> Transformer -----> HashFileStage

The SeqFileStage is there just to prevent compiler errors. It has no role. In the transformer, I look up the hash file as:
UtilityHashLookup("/MyPath/JCF_Hash", 1, 1)

1: KeyValue
1: Value position. Neither 1 nor 2 works.

Here is my hashfile: (starting values)

KeyCol ValueCol
1 0

After updating, ValueCol should be 1. Key column in the hash file is KeyCol. This adds another row to the hashfile instead of replacing the ValueCol with a new value.

I tried several different scenarios so far. Please let me know if I need to be more clear. Thanks.

Posted: Sat Sep 16, 2006 3:09 pm
by ray.wurlod
You need to be more clear. Use a server job, since it must operate in sequential mode in any case.

Prove to us what's in the hashed file. From the Administrator client, execute the command

Code: Select all

LIST.ITEM HashedFileName 'KeyValue'

Posted: Sat Sep 16, 2006 6:52 pm
by kumar_s
If you are ready to use Hahsed file, you can read the hashed file using Hashfile sage, and find the max, use transformer and file stage to replace with the value in the target hashed file. Since it has advantage of, being able to update and read simultenoulsy.

Posted: Sun Sep 17, 2006 11:05 am
by splayer
ray, I tried this:
LIST.ITEM /devetl/ptqs/IDMAR/Data/JCF_Hash 1

I can view my hash file through the hash file stage. There are the 2 columns I have, KeyCol and ValueCol. I have only 1 row. The values are: 1 and 0.

What I want to do id, I want to replace the value of 0 by the next value of 1 and so on. There should never be more than 1 row in the hash file. So in my job, I want to read from the hash file or in a transformer stage, do a lookup, increment it by 1 and write to back to the SAME hash file.

Thank you very much for your responses. Please let me know if I can provide any more information.

Posted: Sun Sep 17, 2006 11:51 am
by splayer
After I try the LIST.ITEM, I get the following error:

Retrieve: syntax error. Unexpected sentence without filename. Token was "". Scanned command was LIST.ITEM '/devetl/ptqs/IDMAR/Data/JCF_Hash' '1'

I also tried:

LIST.ITEM "/devetl/ptqs/IDMAR/Data/JCF_Hash" "1" and
LIST.ITEM "/devetl/ptqs/IDMAR/Data/JCF_Hash" KeyCol

Same error.

Posted: Sun Sep 17, 2006 2:50 pm
by ray.wurlod
LIST.ITEM requires a VOC pointer. A VOC pointer can be created with a SETFILE command. Search the forum for details.