CRC32: DataStage versus Java

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

CRC32: DataStage versus Java

Post by chulett »

Recently went through a bit of pain (with some help from my friends) to get a routine working that takes a string, chunks it up, CRC32's each chunk and then does an "MX" conversion of each CRC32 value to hex. The hex values are concatenated together to form a UID of sorts.

In testing methodologies 'cross-company', when I take values in DataStage and compare them to the Java generated values - they don't match. The only example they gave me was this:

Code: Select all

String:      "Arizona "
DS CRC32:    -1551224588
Java CRC32:  33489333

I would think they should match... any idea what the issue or resolution might be? FWIW, they are allegedly using the following Java class:

http://java.sun.com/j2se/1.4.2/docs/api ... CRC32.html

Thanks!
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Hello Craig,

they won't match (as you've seen). The CRC32 method is a complex one involving building temporary tables and doing polynomial math and it is not standardized. So chances are very high that any implementation will get a different result. But it will always get the same result for the same string on the same box.

I am willing to risk (just a small sum) that even Java implementations will give different results on different HW platforms due to the way floating point numbers are generated and how precise they are.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I just took a look at the algorithm in C++ and it doesn't use floating point; but the initial polynomial is left up to the implementor to choose and the result changes according to the high/low byte order. You are probably seeing different results because the DataStage and Java implementations use a different 32-bit polynomial "seed".
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Thanks for looking at this, Arnd. So, is the implication that a common 32-bit polynomial seed could possibly get us the same results? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

From the code I looked at it will get you a common number. So I think that your Java code will produce the same results on different implementations. I don't know which polynomial DS uses for CRC32, though.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Maybe then a trip down the Ascential Support yellow brick road is in order? Perhaps the Wizard will be able to help us? :P
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

They can give you the magic 32-bit string; and I've seen some sample CRC32 programs on the WWW where you could plugin that value.

If you goal is to be portable, can't you just stick with Ascential's algorithm? It does seem that that there are a couple of ANSI recommendations and perhaps even an ISO norm out there; but I don't think that the CRC32 is going to be changed (it won't be backwards compatible).
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

The goal is not to be portable, per se. This is all part of a larger processes where two disparate processes need to apply this same algorithm to a matching set of data - and get the same answer. My end is the ETL / BASIC end and the other end is written in Java.

If we can't "come to terms" and - for the same series of strings - build the same series of CRC'd keys, then I'll need to start over and come up with something that can be duplicated at both ends. :cry:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Looks like you need to bind in the Java call to DataStage. If you need performance you'll need to use the GCI [have you played around with it before {it's not inherently complex, just tricky}?] or if performance isn't important then you can just do shell script call.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Well... bad word, bad word, bad word. :evil:

So, my only option here is to pitch my stuff and somehow switch to the same Java class 'they' are using? Great. In that case, I might as well revisit the original methodology that got me to this mess - MD5 Encryption. Farg.

Any pointers on where to read about 'the GCI' and Java? I haven't had the pleasure of making its acquaintance yet.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Craig,

you can download the GCI manual from IBM's web pages but I think the effort isn't worth it - if you can get the IBM/Ascential polynomial then you you can find some Java source (I found several references in Google) and plug it in there for use outside of DS. I think that's the minimum amount of overall effort and should get you going quickly.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ok... cool. Let me see if I can turn up the magic polynomial. :wink:

As always, thanks for your help with all this Arnd.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Naah, no problem there; I still enjoy playing with "stuff" like that even though I let my membership to the cryptographer's association lapse :)
Post Reply