pad string

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

RK72
Participant
Posts: 154
Joined: Wed Sep 29, 2010 4:10 pm

pad string

Post by RK72 »

Hi All,

Convert the source records into target format using transformer. source record should turn into target format even if we get new records in future.

Source records:,................. Target Format:
5 ............... 0005
75 ................ 0075
1275 ................ ...... 1275
189 ................. 0175
in this scenario i am using padstring but i got the 0 records from my right side like 5000 how to get 0 recrds leftside like 0005

Thanks
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Show us your syntax.
-craig

"You can never have too many knives" -- Logan Nine Fingers
RK72
Participant
Posts: 154
Joined: Wed Sep 29, 2010 4:10 pm

Post by RK72 »

hi..
my syntax is

If length < 4 Then PadString(DSLink3.SOURCERECORDS,0,4-length) Else DSLink3.SOURCERECORDS
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

What are the data types for your source and target fields? Strings?
-craig

"You can never have too many knives" -- Logan Nine Fingers
RK72
Participant
Posts: 154
Joined: Wed Sep 29, 2010 4:10 pm

Post by RK72 »

Hi
I am using sequentialfiles
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

What are the data types of the source and target columns, not the files!
- james wiles


All generalizations are false, including this one - Mark Twain.
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

You could do a Right("0000":<field>,4)

or maybe even a PadString("",0,4-length) : <field>?

According to the help files, padstring does append
chandra.shekhar@tcs.com
Premium Member
Premium Member
Posts: 353
Joined: Mon Jan 17, 2011 5:03 am
Location: Mumbai, India

Post by chandra.shekhar@tcs.com »

Or you may use this too. 8)

Code: Select all

 STR(0 , 4 - Len(Trim(Col1))) : Trim(Col1)
Thanx and Regards,
ETL User
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

General question to all contributors to this thread: which of the suggested approaches is most efficient (fewest CPU cycles) and which is least efficient?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

I wasn't a contributor, but I believe kryt0n's "Right" solution is the cheapest in terms of CPU cycles.

"If" statements are always expensive, and length statements which need to count the string take up cycles as well.

I usually pad using kryt0n's method as the most effective. If not, I'd love to hear which one is better!

I know I've seen worse...

I saw one job done by a newbie developer that was told they needed to pad about twenty five of the fields to a length of twenty. The job wasn't performing well and when I went in I found each of the fields had code like this...


If (Len(link.column1) = 0) Then "00000000000000000000" Else If (Len(link.column1) = 1) Then "0000000000000000000":link.column1 Else if .... you get the idea....
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
priyadarshikunal
Premium Member
Premium Member
Posts: 1735
Joined: Thu Mar 01, 2007 5:44 am
Location: Troy, MI

Post by priyadarshikunal »

but why not change the data type to number(4) if OP is using sequential file only.
Priyadarshi Kunal

Genius may have its limitations, but stupidity is not thus handicapped. :wink:
Kryt0n
Participant
Posts: 584
Joined: Wed Jun 22, 2005 7:28 pm

Post by Kryt0n »

Couldn't say which is the most efficient, never delved in to the inner workings of how it actually defines any function... and it would take a hell of a lot of records to really make a significant difference to really bother figuring out. (Note: not saying I don't consider performance when developing but don't spend much time trying to save a couple of seconds)

If I were to eke out as much performance as possible, I would probably define a stage variable intialised to "0000" and use that in my right() to avoid too many string initialisations... but still wouldn't know if that was the best option
abc123
Premium Member
Premium Member
Posts: 605
Joined: Fri Aug 25, 2006 8:24 am

Post by abc123 »

Ray, this is old thread but your question is very interesting. I know that I can use APT_* variables to get some values in the log to check performance or just look at job run time but would like to hear from you.

How would you:

a) ..go around seeing how many CPU cycles it is using?
b) ..find the least/most efficient method?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I would create jobs to use each, and process a statistically significant number (millions) of rows through each, controlling for things like caching. Use the player timings and other environment variables, and/or DataStage performance analyzer, and/or Operations Console to perform the measurement.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ssnegi
Participant
Posts: 138
Joined: Thu Nov 15, 2007 4:17 am
Location: Sydney, Australia

Post by ssnegi »

Kryt0n wrote:You could do a Right("0000":<field>,4)

or maybe even a PadString("",0,4-length) : <field>?

According to the help files, padstring does append
Instead of hardcoding the 0 use Str function to generate the zeros equal to the length of field.

Code: Select all

right(Str('0',4):<field>,4)
Post Reply