Checksum Stage not giving consistent output

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sg33
Participant
Posts: 25
Joined: Wed Nov 24, 2010 3:07 am
Location: India

Checksum Stage not giving consistent output

Post by sg33 »

Hi -

I have a job that does a check sum on certain columns. The stage property has the "Use all columns except those specified" and some columns are defined in the "Exclude Column" list.

As part of a change request i have to by pass a couple of additional columns but i don't want the checksum to get impacted. When i am adding these two columns (COL1 and COL2) in the Exclude column list the checksum changes.

My understanding is that the checksum shouldn't be impacted since the new columns are defined to be excluded from the checksum computation.

Any suggestions are appreciated.

Thanks
Best Regards
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I agree with your expectation.

It may be that you have discovered a bug, in which case you report it.

Or it may be that you have RCP enabled on the output link of the stage immediately upstream of the Checksum stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sg33
Participant
Posts: 25
Joined: Wed Nov 24, 2010 3:07 am
Location: India

Post by sg33 »

Thanks Ray, i checked the output link and RCP is not enabled, i will test this fragment of the code with smaller dataset over the next couple of days.

Will let you know what i find.
Best Regards
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Also see this technote in case it may be related.
Choose a job you love, and you will never have to work a day in your life. - Confucius
sg33
Participant
Posts: 25
Joined: Wed Nov 24, 2010 3:07 am
Location: India

Post by sg33 »

Found something really weird during the testing, the job i was working on filters the record based on the checksum. It basically checks after the sort if the checksum for the record is equal to the previous value, if not then it sends the record to the output file.

If i create a copy of the job, compile and rerun it, the number of records going into the output differ significantly.
Original job sends 49452 records.
Copy job sends 135852 records to the output.

Doesn't matter how many times i create a copy or rerun the copy or the original job the numbers remain the same as above.

I tried to import the job into another project and i find same issue. Not sure why the copy job is giving a different number. Help Please!
Best Regards
sg33
Participant
Posts: 25
Joined: Wed Nov 24, 2010 3:07 am
Location: India

Post by sg33 »

My observations for this were very unusual, the original job when renamed with a suffix and compiled and rerun the record count was exactly the same as the copy job.

When the job was renamed to the original name and rerun the count was again exactly the same as the copy job. Atleast the count with both the jobs was consistent.

Not sure why it worked after renaming the job to another name and renaming it back to the original again.

Anyways, the original problem reported for the Checksum stage is no longer there so i will mark the topic as resolved.
Best Regards
Post Reply