Page 1 of 1

Sort stage

Posted: Wed Aug 11, 2004 7:00 am
by KP
I am using the provided sort stage in the datastage 7.0 release and the results are not quite how I need them. The sort field is defined as char and I am using the ASC option in the sort specification however I guess the default is putting numbers before characters and I need characters before numbers. In checking the internal datastage help they reference external map files which I've heard mentioned before but do not understand. An example shows adding this to the ASC option but even the internal documentation for these maps does not tell me if that is what I need to do or which map to use. Is anyone familiar with this or can explain if I need to use one of these maps and how to use it?

Thanks,
Ken

Posted: Wed Aug 11, 2004 7:10 am
by chulett
Hey Ken,

Standard ASCII sorting sequence is numbers first as they have a lower value than letters. Space and punctuation comes first, then numbers followed by upper case and then lower case letters.

If you need to override this, I think you have other options rather than the external 'map' but I'm can't verify this right now. The map, from what I recall from the one time I used it, is a simple text file (that you point to in the Sort stage) that lists the characters in the order you want them sorted. One character per line, I believe. Should be documented in there somewhere... I'll see what I can find when I get in the office, unless someone else beats me to it. :wink:

Posted: Wed Aug 11, 2004 9:34 am
by chulett
If you click on the 'Help' button in the Sort stage itself, you'll get specific online help for it. In the 'Overview' section, check out the 'Sort Criteria' topic, it explains how to set up the Collating Sequence Map you'll need.

Posted: Wed Aug 11, 2004 9:43 pm
by ray.wurlod
Look at any ASCII table. There's one in the DataStage BASIC manual (Appendix B). Numeric characters sort ahead of alphabetic characters. Upper case letters sort before lower case letters. Char(48) is "0", Char(65) is "A", Char(97) is "a".

Posted: Thu Aug 12, 2004 7:28 am
by chulett
Hey, there's an echo in here! :lol:

Posted: Thu Aug 12, 2004 9:39 pm
by ray.wurlod
It's good when the echo can provide specific examples not in the original! 8)

I believe that the Sort stage was never intended to be used seriously. It was a marketing decision; it had to be put in because INFA could do sorting, and was never really done properly. Hence the niche for CoSort stage and myriad advice to use UNIX sort command or SyncSort or CoSort.

Posted: Thu Aug 12, 2004 11:03 pm
by chulett
True, true. :)

I've seen the same thing with the Sort stage, but haven't had the time lately to run any comprehensive benchmarks versus a UNIX level sort, let's say, on this particular server. There *is* SyncSort in the house, but "we" don't have (and can't seem to get) a license for it. :evil:

In people's experience, is a UNIX sort perferrable / more perfomant than the Sort stage when handling "large" amounts of records? Specifically, when you get up into the 10 or 20 million record sorts?