Generate multiple Checksum/SK in a generic job

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
rohit_mca2003
Participant
Posts: 41
Joined: Wed Oct 08, 2008 9:19 am

Generate multiple Checksum/SK in a generic job

Post by rohit_mca2003 »

Hi,

I have a requirement to use a generic job to read source file and load data into table.
Each time I have a new source file, the corresponding target may have different number of Checksum/Surrogate Key columns.

Example (Scenario 1):
---------------------------
Source --> File 1 (Col1, Col2, Col3, Col4)
Target --> Table1 (Col1, Col2, Col3, Col4, Checksum(Col1,Col2), Checksum(Col3,Col4))

Example (Scenario 2):
---------------------------
Source --> File 1 (Col1, Col2, Col3, Col4, Col5)
Target --> Table1 (Col1, Col2, Col3, Col4, Checksum(Col1,Col2), Checksum(Col3,Col4), Checksum(Col5))

So each time I receive a source, I have to generate different number of checksum column.
Is there any way I can achieve this using generic job like
(Source --> Generate different number of Checksums --> Target)

Thanks,
Rohit
Rohit
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

there are a couple of ways... the way we have been doing it is to set up for a good maximum # of columns you might need, for example say you need 3 or 4, then you might set up for 6 or 8.

If a particular input is blank, then the related checksum column on the output would be blank (and no work done, beyond carrying the empty columns around for a short time). Your using job can drop the unused columns.

so if you needed 2, the first 2 columns to your shared code have the data that will be run thru the checksum, the others empty. If you need 3, the first 3, etc. Its a little clunky, but its flexible and worked well for us.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

... and that is valid for a "generic" job? As in one using RCP, I assume.
-craig

"You can never have too many knives" -- Logan Nine Fingers
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

I can't say about valid.
It works with rcp. In this example you need to have the columns you want checksums on exposed and mapped into inputs, but the rest can be rcp.

I am open to a better method ... I don't really like it, but my company has been doing it that way unquestioned for a while.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Good to know. Was mostly worried about the new source file comment where each "may have different number" of columns to checksum.
-craig

"You can never have too many knives" -- Logan Nine Fingers
rohit_mca2003
Participant
Posts: 41
Joined: Wed Oct 08, 2008 9:19 am

Post by rohit_mca2003 »

Thanks for the replies.
Even we are following similar approach. Putting a maximum number of Checksum.
For each checksum we provide column/s using parameter. This job is RCP.
At the end we Drop unnecessary checksum columns (again controlled by parameters for particular instance/value file).

Thanks for all help.
Rohit
Post Reply