Data Manipulation in a Flat file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Data Manipulation in a Flat file

Post by narasimha »

Is there a way we could convert a file without writing a Routine
From the way it looks

Code: Select all

222,333,4444		
235, 5644,453               
423,8576,764,353	
764,431,533 
to this

Code: Select all

222,333,4444$222
222,333,4444$333
222,333,4444$444
235,5644,453$235
235,5644,453$5644
235,5644,453$453
423,8576,764,353$423
423,8576,764,353$58576
423,8576,764,353$764
423,8576,764,353$353
764,431,533$764
764,431,533$431
764,431,533$533
The $ could be any delimiter other than ","
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Re: Data Manipulation in a Flat file

Post by kwwilliams »

Have you tried using a job to do it? Run your file through a transformer and have a transformation on the final column be col3:'|':col1.
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

Oops What is a job? :shock:
Have you tried using a job to do it?
If you have observed the input file it does not have 3 columns all the time,
Run your file through a transformer and have a transformation on the final column be col3:'|':col1.
I guess this wont work :roll:
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

what is the max number of columns you have? why i am asking is because, if i were you i would do as follows,
* have the input file going into a transformer
* have n number of links coming out of your transformer (n = num. of columns)
* inside the transformer apply your logic
eg: lets take your first row "colA = 222, colB = 333, colC=4444"
in the out put link(1st link), do this

Code: Select all

 colA = colA
              colB = colB
              colC = colC:"$":colA
in second link

Code: Select all

     colA = colA
              colB = colB
              colC = colC:"$":colB
and third link
[/code] colA = colA
colB = colB
colC = colC:"$":colC[/code]

*then have your output links collected by a link collector, into a seq. file.

thats one of the solutions, not the best i admit, but on of the ways you could go about achieving your goal
cheers.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
djm
Participant
Posts: 68
Joined: Wed Mar 02, 2005 3:42 am
Location: N.Z.

Post by djm »

An awk script is one way. e.g.

Code: Select all

awk -F, '
{
  for ( i = 1; ( i <= NF ); i++ )
  {
    print $0 "$" $i
  }
}' yourfile > new file
Testing the script

Code: Select all

cat |
awk -F, '
{
  for ( i = 1; ( i <= NF ); i++ )
  {
    print $0 "$" $i;
  }
}' << EOD
222,333,4444      
235, 5644,453
423,8576,764,353   
764,431,533
EOD
yields

Code: Select all

222,333,4444$222
222,333,4444$333
222,333,4444$4444
235, 5644,453$235
235, 5644,453$ 5644
235, 5644,453$453
423,8576,764,353$423
423,8576,764,353$8576
423,8576,764,353$764
423,8576,764,353$353
764,431,533$764
764,431,533$431
764,431,533$533
Hope this helps.

David

(previously known as "D" - thanks WebMaster)
(Previously known as D)

Be alturistic and donate your spare CPU cycles to research. http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

Kool 8)
Again David it works like magic
I could make this into a script with accepts my file as a parameter and give me an output file.
But just curious to know a way to do this in wholly in DS
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

narasimha wrote:Oops What is a job? :shock:
Have you tried using a job to do it?
If you have observed the input file it does not have 3 columns all the time,
Run your file through a transformer and have a transformation on the final column be col3:'|':col1.
I guess this wont work :roll:
Sarchasm and insults will generate little future help :evil:

The amount of skill you have was unknown to me and the amount of details that you provide aid in the answer you will receive. A list of columns without stating that the number of columns is dynamic expects me to take the time to expect your data set instead of looking at the first three lines and making an assumption on what the rest of your data looks like.

Try to keep your posts polite. This is a resource that most of us do not get paid for and I really don't appreciate the tone of your post in response to my free but incorrect reply.
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

Keith
I am sorry if I have offended you
I thought your quote
Have you tried using a job to do it?
was a little sarcastic

PEACE :oops:
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

Wow that was a fast response. I appreciate and accept the apology.

Written form like this is really hard to interpret. I glanced at your post and quickly replied, so the brevity of the post may have led you to believe that it was sarchastic.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Use a Sequential stage with 1 column defined to read the entire row as a single column and then use a combination of FIELD and concatenation statements to assemble the changed row to output to another Sequential file again as a single column.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
narasimha
Charter Member
Charter Member
Posts: 1236
Joined: Fri Oct 22, 2004 8:59 am
Location: Staten Island, NY

Post by narasimha »

Ken,

Can we take this approach if we have dynamic number of values coming in? If yes can you please elaborate.
Narasimha Kade

Finding answers is simple, all you need to do is come up with the correct questions.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The whole row is a single text string value, there's only one column defined. Simple design.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
I_Server_Whale
Premium Member
Premium Member
Posts: 1255
Joined: Wed Feb 02, 2005 11:54 am
Location: United States of America

Post by I_Server_Whale »

As Ken said, you should grab(or more appropriately 'read') all the columns of the source as one column.

But it helps if you know the maximum no. of columns that the source can have. Calculate (add-up) the sizes of these columns. Now this result will be the length of the ONE column that you'll be using.

thanks,
Naveen.
Anything that won't sell, I don't want to invent. Its sale is proof of utility, and utility is success.
Author: Thomas A. Edison 1847-1931, American Inventor, Entrepreneur, Founder of GE
ameyvaidya
Charter Member
Charter Member
Posts: 166
Joined: Wed Mar 16, 2005 6:52 am
Location: Mumbai, India

Post by ameyvaidya »

The "DCOUNT" function will give the Index of the Last delimited Source Field to pick using "FIELD".
Amey Vaidya<i>
I am rarely happier than when spending an entire day programming my computer to perform automatically a task that it would otherwise take me a good ten seconds to do by hand.</i>
<i>- Douglas Adams</i>
Sunshine2323
Charter Member
Charter Member
Posts: 130
Joined: Mon Sep 06, 2004 3:05 am
Location: Dubai,UAE

Post by Sunshine2323 »

Hi,

I have achieved your requirement in the following way

Job Design:

Code: Select all

InputFile----->TransformerA----->InterMediateFile------->TransformerB----->OutputFile
InputFile:
I am reading the input data in a Single Column say Field1

TransformerA
Declaring 2 stage variables in TransformerA
StageVar=Count(DSLink2.Field1,',')
StageVar1=Str(DSLink2.Field1:char(13):char(10), StageVar+1)

Output Derivation for Intermediate File:
I am writing only one field to the InterMediateFile
Field1=Trim(StageVar1,char(10),"T")

TransformerB
Declaring 3 stage variables in TransformerA
StageVar=RowProcCompareWithPreviousValue(Field(DSLink3.Field1,',',1))
Initial Value of StageVar1=1
StageVar1=If StageVar=1 then StageVar1+1 else 1
StageVar2=Field(DSLink3.Field1,',', StageVar1)

Output Derivation for Output File:
Field1=DSLink3.Field1:'$': StageVar2

Testing with Input:

Code: Select all

222,333,4444
235, 5644,453
423,8576,764,353
764,431,533


IntermediateFile:

Code: Select all

222,333,4444
222,333,4444
222,333,4444
235, 5644,453
235, 5644,453
235, 5644,453
423,8576,764,353
423,8576,764,353
423,8576,764,353
423,8576,764,353
764,431,533
764,431,533
764,431,533
Output File:

Code: Select all

222,333,4444$222
222,333,4444$333
222,333,4444$4444
235, 5644,453$235
235, 5644,453$ 5644
235, 5644,453$453
423,8576,764,353$423
423,8576,764,353$8576
423,8576,764,353$764
423,8576,764,353$353
764,431,533$764
764,431,533$431
764,431,533$533
Hope this helps :)
Warm Regards,
Amruta Bandekar

<b>If A equals success, then the formula is: A = X + Y + Z, X is work. Y is play. Z is keep your mouth shut. </b>
--Albert Einstein
Post Reply