Recently went through a bit of pain (with some help from my friends) to get a routine working that takes a string, chunks it up, CRC32's each chunk and then does an "MX" conversion of each CRC32 value to hex. The hex values are concatenated together to form a UID of sorts.
In testing methodologies 'cross-company', when I take values in DataStage and compare them to the Java generated values - they don't match. The only example they gave me was this:
they won't match (as you've seen). The CRC32 method is a complex one involving building temporary tables and doing polynomial math and it is not standardized. So chances are very high that any implementation will get a different result. But it will always get the same result for the same string on the same box.
I am willing to risk (just a small sum) that even Java implementations will give different results on different HW platforms due to the way floating point numbers are generated and how precise they are.
I just took a look at the algorithm in C++ and it doesn't use floating point; but the initial polynomial is left up to the implementor to choose and the result changes according to the high/low byte order. You are probably seeing different results because the DataStage and Java implementations use a different 32-bit polynomial "seed".
From the code I looked at it will get you a common number. So I think that your Java code will produce the same results on different implementations. I don't know which polynomial DS uses for CRC32, though.
They can give you the magic 32-bit string; and I've seen some sample CRC32 programs on the WWW where you could plugin that value.
If you goal is to be portable, can't you just stick with Ascential's algorithm? It does seem that that there are a couple of ANSI recommendations and perhaps even an ISO norm out there; but I don't think that the CRC32 is going to be changed (it won't be backwards compatible).
The goal is not to be portable, per se. This is all part of a larger processes where two disparate processes need to apply this same algorithm to a matching set of data - and get the same answer. My end is the ETL / BASIC end and the other end is written in Java.
If we can't "come to terms" and - for the same series of strings - build the same series of CRC'd keys, then I'll need to start over and come up with something that can be duplicated at both ends.
-craig
"You can never have too many knives" -- Logan Nine Fingers
Looks like you need to bind in the Java call to DataStage. If you need performance you'll need to use the GCI [have you played around with it before {it's not inherently complex, just tricky}?] or if performance isn't important then you can just do shell script call.
So, my only option here is to pitch my stuff and somehow switch to the same Java class 'they' are using? Great. In that case, I might as well revisit the original methodology that got me to this mess - MD5 Encryption. Farg.
Any pointers on where to read about 'the GCI' and Java? I haven't had the pleasure of making its acquaintance yet.
-craig
"You can never have too many knives" -- Logan Nine Fingers
you can download the GCI manual from IBM's web pages but I think the effort isn't worth it - if you can get the IBM/Ascential polynomial then you you can find some Java source (I found several references in Google) and plug it in there for use outside of DS. I think that's the minimum amount of overall effort and should get you going quickly.