Page 1 of 1
CRC32: DataStage versus Java
Posted: Tue Nov 08, 2005 5:47 pm
by chulett
Recently went through a bit of pain (with some help from my friends) to get a routine working that takes a string, chunks it up, CRC32's each chunk and then does an "MX" conversion of each CRC32 value to hex. The hex values are concatenated together to form a UID of sorts.
In testing methodologies 'cross-company', when I take values in DataStage and compare them to the Java generated values - they don't match. The only example they gave me was this:
Code: Select all
String: "Arizona "
DS CRC32: -1551224588
Java CRC32: 33489333
I would
think they should match... any idea what the issue or resolution might be? FWIW, they are allegedly using the following Java class:
http://java.sun.com/j2se/1.4.2/docs/api ... CRC32.html
Thanks!
Posted: Wed Nov 09, 2005 1:42 am
by ArndW
Hello Craig,
they won't match (as you've seen). The CRC32 method is a complex one involving building temporary tables and doing polynomial math and it is not standardized. So chances are very high that any implementation will get a different result. But it will always get the same result for the same string on the same box.
I am willing to risk (just a small sum) that even Java implementations will give different results on different HW platforms due to the way floating point numbers are generated and how precise they are.
Posted: Wed Nov 09, 2005 1:49 am
by ArndW
I just took a look at the algorithm in C++ and it doesn't use floating point; but the initial polynomial is left up to the implementor to choose and the result changes according to the high/low byte order. You are probably seeing different results because the DataStage and Java implementations use a different 32-bit polynomial "seed".
Posted: Wed Nov 09, 2005 2:49 am
by chulett
Thanks for looking at this, Arnd. So, is the implication that a common 32-bit polynomial seed
could possibly get us the same results?

Posted: Wed Nov 09, 2005 2:53 am
by ArndW
From the code I looked at it will get you a common number. So I think that your Java code will produce the same results on different implementations. I don't know which polynomial DS uses for CRC32, though.
Posted: Wed Nov 09, 2005 2:57 am
by chulett
Maybe then a trip down the Ascential Support yellow brick road is in order? Perhaps the Wizard will be able to help us?

Posted: Wed Nov 09, 2005 3:04 am
by ArndW
They can give you the magic 32-bit string; and I've seen some sample CRC32 programs on the WWW where you could plugin that value.
If you goal is to be portable, can't you just stick with Ascential's algorithm? It does seem that that there are a couple of ANSI recommendations and perhaps even an ISO norm out there; but I don't think that the CRC32 is going to be changed (it won't be backwards compatible).
Posted: Wed Nov 09, 2005 4:20 am
by chulett
The goal is not to be portable,
per se. This is all part of a larger processes where two disparate processes need to apply this same algorithm to a matching set of data - and get the same answer. My end is the ETL / BASIC end and the other end is written in Java.
If we can't "come to terms" and - for the same series of strings - build the same series of CRC'd keys, then I'll need to start over and come up with something that
can be duplicated at both ends.

Posted: Wed Nov 09, 2005 4:29 am
by ArndW
Looks like you need to bind in the Java call to DataStage. If you need performance you'll need to use the GCI [have you played around with it before {it's not inherently complex, just tricky}?] or if performance isn't important then you can just do shell script call.
Posted: Wed Nov 09, 2005 8:39 am
by chulett
Well... bad word, bad word, bad word.
So, my only option here is to pitch my stuff and somehow switch to the same Java class 'they' are using? Great. In that case, I might as well revisit the original methodology that got me to this mess - MD5 Encryption. Farg.
Any pointers on where to read about 'the GCI' and Java? I haven't had the pleasure of making its acquaintance yet.
Posted: Wed Nov 09, 2005 8:48 am
by ArndW
Craig,
you can download the GCI manual from IBM's web pages but I think the effort isn't worth it - if you can get the IBM/Ascential polynomial then you you can find some Java source (I found several references in Google) and plug it in there for use outside of DS. I think that's the minimum amount of overall effort and should get you going quickly.
Posted: Wed Nov 09, 2005 9:18 am
by chulett
Ok... cool. Let me see if I can turn up the magic polynomial.
As always, thanks for your help with all this Arnd.
Posted: Wed Nov 09, 2005 9:42 am
by ArndW
Naah, no problem there; I still enjoy playing with "stuff" like that even though I let my membership to the cryptographer's association lapse
