Hi All
I am having history data from client loaded into a hash file. I need to use this hash file and compare with source to get delta data.
Since source is not having any date columns, so I merging all the columns from source as well as from the target and generating CRC values for both. Now my logic is, if source CRC does not match with hash crc, I am passing those rows as delta data to target.
Is this right approach, CRC approach is good enuf if I have rows more than a million?
Please help me out with this issue. I have seen that if rows are more than 200k, it shows that CRC getting duplicated. Is it because of CRC or I have done something wrong in my job.
Thanks and Regards,
Surendra Kumar Sharma
CRC reliability
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
CRC32() uses an algorithm to generate a 32-bit integer, repeatable for any given argument. Therefore it has approximately a one chance in 2^32 (one in 4,294,967,296) of generating a false positive. If that's within your comfort zone, go for it. Make sure there are no NULL values in the argument.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Who cares if it generates 'duplicate' values within your data? All you care about is, if any aspect of the data changes, does the new CRC value differ from the old CRC value.
In my mind, Ray's 'false positive' is the chance that the values in a single record would change and still manage to generate the same CRC value. I guess more of a false negative in that case.
In my mind, Ray's 'false positive' is the chance that the values in a single record would change and still manage to generate the same CRC value. I guess more of a false negative in that case.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 30
- Joined: Sun Apr 06, 2008 9:58 pm
I too have similar requirement for my current project. I am using merge stage("left only" option) to find delta records. I use current file and prev day file for this merge. After all processing, the prev day file will be overwritten by current file. I go for this approach because of high data volume.
Hope this may help you.
Hope this may help you.