Most efficient way to check a range of character values

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Most efficient way to check a range of character values

Post by chulett »

I've seen plenty of inefficient methods but I'm looking for the most efficient way out there with the tools at hand. I've used match or alpha when I need to perform a simple alphabetic character check - but how about a range check? Something like when the field must be from 'A' to 'M' inclusive? Looking for the most efficient way to handle that when it needs to be done bazillions of times.

Thanks!
-craig

"You can never have too many knives" -- Logan Nine Fingers
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

What other methods have you tried. If its a range, try changing it into ascii and checking that range.
For example, Alphabets are between 065 - 090 for Uppercase and 097 - 122 for lowercase. That would be much faster.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Done that, figured it's one of the better ways. Was really wondering if I was missing some sort of uber command, a +3 Sword of Universe Smiting that only a level 30 character would know how to wield. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

There's range checks like 1 to 10, which is really 1.00000000000000 to 10.000000000000, with an infinite number of values between because of decimals.

For this < and > are the best.

There's domain checks like 1 to 10, which is really 1,2,3,4,5,6,7,8,9,10.

For this, enforce no decimal and then the < and > are best.

There's string checks, which are more like domain checks.

For this, you have either the use of a dynamic array for a small number (< 10K) of values, sorted, using the LOCATE BY method for finding. For medium values ( <100K) probably an delimited string with INDEX is good, but after that it's the good old hashed file.

There's string checks, which are more like range checks (A..M).

I'd probably just do a < and > because DS BASIC automatically does lexicographic, which means left to right. No need to switch to CHAR values in an added step, as it's done under the covers anyway.


Of course, once Ray wakes up, he'll render our opinions moot. :lol:
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

O yes. Completely forgot that while using > and < operators on characters it implicitly changes them to its ascii. So there you go, my suggestion is already moot. :oops:

Craig, look at the R code (Range function). You can specify the range, the lower bounds and upper bounds and the number of ranges. That might help speed up the process.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Thanks guys. Now just anxiously awaiting his Mootness. :lol:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Particularly if the range is contiguous, the Index() function is likely to be fastest.

Code: Select all

Index("ABCDEFGHIJKLM", InLink.TheColumn,1) > 0


The comparison operators (> and <) are fine, and - given that the data are already non-numeric - will automatically do a left-justified comparison.

I suspect a "range conversion" would be fairly slick also.

Code: Select all

Oconv(InLink.TheColumn, "RA,M")

Particularly if multiple ranges need to be tested.

Code: Select all

Oconv(InLink.TheColumn, "RA,M;V,W")
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

ray.wurlod wrote:I suspect a "range conversion" would be fairly slick also.

Code: Select all

Oconv(InLink.TheColumn, "RA,M")

Particularly if multiple ranges need to be tested.

Code: Select all

Oconv(InLink.TheColumn, "RA,M;V,W")


I believe that the OCONV representation of the R code that i was talking about. Right Ray?
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

The way to test this is to ... test it!

I usually use several million loop iterations and store the value of SYSTEM(9) before and subtract that from the value after the loop. This value is milliseconds of CPU so if you can make your baseline conversion run for about 5 or 10 minutes then the results should be quite comparable. I suspect that the c-coded ICONV/OCONV routines are going to be pretty efficient.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Oh, come on now Arnd... test it? :roll: When someone can just come here and ask the greatest minds on the planet? :lol:

What were you thinking?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Craig - I hang my head in shame :oops: and won't make any more whacko, off-the-wall and unrealistic suggestions for several thousand more milliseconds...
Post Reply