If key already exists, do NOT pass record, else process it.

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

If key already exists, do NOT pass record, else process it.

Post by gateleys »

Hi, there are multiple records for each employee (identified by an ID). What I need to do is to -
1. Process the record if that employee record has not been processed before (based on the ID).
2. If it has been encountered before, do not process it.

I am using a hash file which contains the unique IDs for each employee. By DB source,passes multiple rows for each employee to the transformer. Here, a lookup to the hash file is to be done. I dunno how I should proceed. Please suggest.

Thanks.
Bala R
Participant
Posts: 66
Joined: Mon May 30, 2005 9:52 pm

Post by Bala R »

On the tabular display of your output link in transformer, you would find a header called constraints. Double click on it to open the expression editor for constraints and enter lookup_ID.NOTFOUND, where lookup_ID is your lookup input link.
When you have opened your transformer stage constraints editor, click on the Help button. Constraints is lucidly explained in DS help topics.
IHTH.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Re: If key already exists, do NOT pass record, else process

Post by gateleys »

Hello guys, please help me with this. It looks easy, but has been bugging me. For the above problem, I have used a hash file which is initially empty. For the first record for an employee, I look up the hash, if ID.NOTFOUND (which of course is true) then I intend to insert the ID to the SAME hash and also pass the record to the output link. For subsequent records of the same employee, when the hash is looked up, the ID will be found and hence, the record should NOT be sent to the output.

In short, I just need to push only one record for each employee to the output. How do I do it? Please help.

gateleys wrote:Hi, there are multiple records for each employee (identified by an ID). What I need to do is to -
1. Process the record if that employee record has not been processed before (based on the ID).
2. If it has been encountered before, do not process it.

I am using a hash file which contains the unique IDs for each employee. By DB source,passes multiple rows for each employee to the transformer. Here, a lookup to the hash file is to be done. I dunno how I should proceed. Please suggest.

Thanks.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

Hey Bala,
With your logic of inputLink_hash_id.NOTFOUND as the constraint, all the rows will result in FALSE (that is, ID was FOUND, given that the hash has one record for each employee) since every source record will FIND an ID match in the lookup. So, could you please be more elaborate.
THanks for your earlier response though.

Bala R wrote:On the tabular display of your output link in transformer, you would find a header called constraints. Double click on it to open the expression editor for constraints and enter lookup_ID.NOTFOUND, where lookup_ID is your lookup input link.
When you have opened your transformer stage constraints editor, click on the Help button. Constraints is lucidly explained in DS help topics.
IHTH.
Bala R
Participant
Posts: 66
Joined: Mon May 30, 2005 9:52 pm

Post by Bala R »

Please provide what you have tried and not working.
As far as I know you should be able to write and read from a hashfile in a job. Mebbe you could turn on caching WRITE IMMEDIATE while you write to the hashfile and use 'Disable, Lock for update' in preload option while you read.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

ODBC ----->Transformer---------->seqFile
inlink | | outlink
| |
LookupLink | | WriteHashLink(to write NOTFOUND id to hash)
Hash
The constraint in the outlink contains LookupLink.NOTFOUND (for the ID). If the hash is populated, initially, with the IDs of each employee, then each ID from source will find a match. In this case the WriteHashLink is NOT required.
However, if I use a logic where the hash is initially empty, and I use a constraint of LookupLink.NOTFOUND (for the ID). Since for first record of each employee, there is no matching ID in the hash, I try to write the ID to the hash and also to the outLink. But, the next time the same employee is encountered, it will be found, and hence not passed to the output.
Can you tell me how I associate this logic in the transformer. I did use the
'Allow stage write cache' in the inputs tab of hash and 'Disabled, Locked for Updates' in the Outputs tab of hash. Compile error tells me 'Job contains cyclic or linear dependencies. Job will not run'. Please explain.
Bala R wrote:Please provide what you have tried and not working.
As far as I know you should be able to write and read from a hashfile in a job. Mebbe you could turn on caching WRITE IMMEDIATE while you write to the hashfile and use 'Disable, Lock for update' in preload option while you read.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can not use write cache, because the results have to be available immediately.
You can not use read cache, because the hashed file is cached when the stage opens and doesn't change subsequently in memory.
And you must specify "lock for update" - that is, "Disabled, lock for update".
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

Hey Ray,
I did what you told me, but I still get the same compile error -Job contains cyclic or linear dependencies. I feel the there is something wrong with my writing to the hash any ID that is NOTFOUND. I tried to manage it as a reject also...no success.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You must use separate Hashed File stages for reading and writing.

Add an Annotation to the job to indicate that both stages refer to the same hashed file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

Got it working. I used stage variable on sorted records. That was easy..thanx for your help guys.
Post Reply