If key already exists, do NOT pass record, else process it.

gateleys · Post by **gateleys** » Sun Oct 02, 2005 8:18 pm

Hi, there are multiple records for each employee (identified by an ID). What I need to do is to -
1. Process the record if that employee record has not been processed before (based on the ID).
2. If it has been encountered before, do not process it.

I am using a hash file which contains the unique IDs for each employee. By DB source,passes multiple rows for each employee to the transformer. Here, a lookup to the hash file is to be done. I dunno how I should proceed. Please suggest.

Thanks.

Bala R · Post by **Bala R** » Sun Oct 02, 2005 10:26 pm

On the tabular display of your output link in transformer, you would find a header called constraints. Double click on it to open the expression editor for constraints and enter lookup_ID.NOTFOUND, where lookup_ID is your lookup input link.
When you have opened your transformer stage constraints editor, click on the Help button. Constraints is lucidly explained in DS help topics.
IHTH.

gateleys · Post by **gateleys** » Sun Oct 02, 2005 10:31 pm

Hello guys, please help me with this. It looks easy, but has been bugging me. For the above problem, I have used a hash file which is initially empty. For the first record for an employee, I look up the hash, if ID.NOTFOUND (which of course is true) then I intend to insert the ID to the SAME hash and also pass the record to the output link. For subsequent records of the same employee, when the hash is looked up, the ID will be found and hence, the record should NOT be sent to the output.

In short, I just need to push only one record for each employee to the output. How do I do it? Please help.

gateleys wrote:Hi, there are multiple records for each employee (identified by an ID). What I need to do is to -
1. Process the record if that employee record has not been processed before (based on the ID).
2. If it has been encountered before, do not process it.

I am using a hash file which contains the unique IDs for each employee. By DB source,passes multiple rows for each employee to the transformer. Here, a lookup to the hash file is to be done. I dunno how I should proceed. Please suggest.

Thanks.

gateleys · Post by **gateleys** » Sun Oct 02, 2005 10:55 pm

Hey Bala,
With your logic of inputLink_hash_id.NOTFOUND as the constraint, all the rows will result in FALSE (that is, ID was FOUND, given that the hash has one record for each employee) since every source record will FIND an ID match in the lookup. So, could you please be more elaborate.
THanks for your earlier response though.

Bala R wrote:On the tabular display of your output link in transformer, you would find a header called constraints. Double click on it to open the expression editor for constraints and enter lookup_ID.NOTFOUND, where lookup_ID is your lookup input link.
When you have opened your transformer stage constraints editor, click on the Help button. Constraints is lucidly explained in DS help topics.
IHTH.

Bala R · Post by **Bala R** » Sun Oct 02, 2005 11:01 pm

Please provide what you have tried and not working.
As far as I know you should be able to write and read from a hashfile in a job. Mebbe you could turn on caching WRITE IMMEDIATE while you write to the hashfile and use 'Disable, Lock for update' in preload option while you read.

gateleys · Post by **gateleys** » Sun Oct 02, 2005 11:26 pm

ODBC ----->Transformer---------->seqFile
inlink | | outlink
| |
LookupLink | | WriteHashLink(to write NOTFOUND id to hash)
Hash
The constraint in the outlink contains LookupLink.NOTFOUND (for the ID). If the hash is populated, initially, with the IDs of each employee, then each ID from source will find a match. In this case the WriteHashLink is NOT required.
However, if I use a logic where the hash is initially empty, and I use a constraint of LookupLink.NOTFOUND (for the ID). Since for first record of each employee, there is no matching ID in the hash, I try to write the ID to the hash and also to the outLink. But, the next time the same employee is encountered, it will be found, and hence not passed to the output.
Can you tell me how I associate this logic in the transformer. I did use the
'Allow stage write cache' in the inputs tab of hash and 'Disabled, Locked for Updates' in the Outputs tab of hash. Compile error tells me 'Job contains cyclic or linear dependencies. Job will not run'. Please explain.

Bala R wrote:Please provide what you have tried and not working.
As far as I know you should be able to write and read from a hashfile in a job. Mebbe you could turn on caching WRITE IMMEDIATE while you write to the hashfile and use 'Disable, Lock for update' in preload option while you read.

ray.wurlod · Post by **ray.wurlod** » Sun Oct 02, 2005 11:28 pm

You can not use write cache, because the results have to be available immediately.
You can not use read cache, because the hashed file is cached when the stage opens and doesn't change subsequently in memory.
And you must specify "lock for update" - that is, "Disabled, lock for update".

gateleys · Post by **gateleys** » Sun Oct 02, 2005 11:39 pm

Hey Ray,
I did what you told me, but I still get the same compile error -Job contains cyclic or linear dependencies. I feel the there is something wrong with my writing to the hash any ID that is NOTFOUND. I tried to manage it as a reject also...no success.

ray.wurlod · Post by **ray.wurlod** » Mon Oct 03, 2005 5:36 am

You must use separate Hashed File stages for reading and writing.

Add an Annotation to the job to indicate that both stages refer to the same hashed file.

gateleys · Post by **gateleys** » Mon Oct 03, 2005 9:45 pm

Got it working. I used stage variable on sorted records. That was easy..thanx for your help guys.

DSXchange

If key already exists, do NOT pass record, else process it.

If key already exists, do NOT pass record, else process it.

Re: If key already exists, do NOT pass record, else process