writing input 1 row as multiple rows to the target db

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nsm
Premium Member
Premium Member
Posts: 139
Joined: Mon Feb 09, 2004 8:58 am

writing input 1 row as multiple rows to the target db

Post by nsm »

Hi,

in the source file thre ia a free text field with size of Char(255).
I had to read each word in it and insert into another table.
(Don't know th max number of words in a row)

I thought of using stage variables to do that but couldn't do that as I didn't know max numbe of words.I can't use stage variables unless I know maximum number of words.

Thought of writing a routine but may be with large volume of data coming in I guess that's also not a good solution.

It can be done via server or parallel job.
Please let me know.

Thanks
nsm.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Write the field to a sequential file using a space as the line terminator, then read this sequential file again using one column and space as the line terminator and you will read in each word as a distinct row, which you can then use to populate your table. For performance you can specify that this temporary file is a named pipe so that no extra I/O is done.
csrazdan
Participant
Posts: 127
Joined: Wed May 12, 2004 6:03 pm
Location: Chicago IL

Post by csrazdan »

Questions:

1. You have to insert each word in table as a column or as a row?

2. If columns, them what has to be done if number of words are more than columns?

3. What is delimiter between the words? Is it space?

4. Does the file contain only this field or it has some other column also. In short want to know the foramt of the file.
Assume everything I say or do is positive
nsm
Premium Member
Premium Member
Posts: 139
Joined: Mon Feb 09, 2004 8:58 am

Post by nsm »

Thanks Arndw for your response but, I can't see your response as I am not a Paid member. If you can please post the complete message here that will be really helpful for me.

csrazdan,
Input row is space delimited between words.
I need to write each word as a row to the target database. there is another key column in the file.

Thanks
nsm.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I kept my first post short enough that I thought it would show in its entirety. Just write your column to a file and use spaces as your line terminators, then read that file using space as a terminator. That way, DS will read each word from your original column as a separate line.
nsm
Premium Member
Premium Member
Posts: 139
Joined: Mon Feb 09, 2004 8:58 am

Post by nsm »

Thanks much Arndw.
csrazdan
Participant
Posts: 127
Joined: Wed May 12, 2004 6:03 pm
Location: Chicago IL

Post by csrazdan »

nsm wrote:Thanks Arndw for your response but, I can't see your response as I am not a Paid member. If you can please post the complete message here that will be really helpful for me.

csrazdan,
Input row is space delimited between words.
I need to write each word as a row to the target database. there is another key column in the file.

Thanks
nsm.
I can think of following procedure:

1. You know the max length of the field, 255. So worst case you can have 128 words of 1 char each with space as a delimiter.

2. In sequential stage read this column in 1 field of CHAR 255. Let us call this filed as InputFiled.

3. Next in transformer, create 128 columns from InputFiled using transformation:

Code: Select all

  OutputFiledName --> Field(LinkName.FieldName,"Delimiter ",DelimiterNo)
  
For example:  
     Field1 --> Field(LinkName.InputField," ",1) 
     Field2 --> Field(LinkName.InputField," ",2)
     ......
    Field128 --> Field(LinkName.InputField," ",128) 
4. Downstream use Pivot stage to convert Filed1 thru Field128 into rows. Let us call converted column as OutField

5. Next in transformer, put constraint for OutField IS NOT NULL.

6. Insert output of transformer into database.

Hope it helps........
Assume everything I say or do is positive
csrazdan
Participant
Posts: 127
Joined: Wed May 12, 2004 6:03 pm
Location: Chicago IL

Post by csrazdan »

ArndW wrote:I kept my first post short enough that I thought it would show in its entirety. Just write your column to a file and use spaces as your line terminators, then read that file using space as a terminator. That way, DS will read each word from your original column as a separate line.
This is really good one if file had only 1 column.

But he has another key column in the file. How is he going to maintain relationship with the key column using this approach?
Assume everything I say or do is positive
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

csrazdan - I answered the question he posted, there is no mention of a key column made. If there were a key column and a 1-n relationship I would still use the same base method, but write out 2 columns with the key repeating. The derivation for the output file would be:

CONVERT(CATS(REUSE(In.Key):',',CONVERT(In.Data,' ',@VM)),@VM,' ')

The output file metadata has only 1 column, the metadata for that file when read back in would contain 2 columns, the key and the word data. The column separator is a comma.

I reconverted the @VM back to a space so it adheres to the first explanation, it would be more efficient to keep the @VM and use that as the row separator when reading it back in.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Change the delimiter in the column to @VM and write to a Hashed file stage. Add an output link from the Hashed file stage and stream to a Sequential stage. In the output link metadata add a name, anything will do, to the Association column. Under the Normalize On drop down box, choose the Association name you just entered.

This functionality will now "Normalize" nested data in a column, propagating repeating values (such as your key column) and creating a unique row for each value in your delimited list column. This normalization feature is quite handy when your data has associated arrays of delimited data across multiple columns (a nested table, if you will).

This functionality has been in the Server product since the beginning, but most people have never ever used it.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Post Reply