writing input 1 row as multiple rows to the target db

nsm · Post by **nsm** » Tue Jun 05, 2007 3:02 pm

Hi,

in the source file thre ia a free text field with size of Char(255).
I had to read each word in it and insert into another table.
(Don't know th max number of words in a row)

I thought of using stage variables to do that but couldn't do that as I didn't know max numbe of words.I can't use stage variables unless I know maximum number of words.

Thought of writing a routine but may be with large volume of data coming in I guess that's also not a good solution.

It can be done via server or parallel job.
Please let me know.

Thanks
nsm.

ArndW · Post by **ArndW** » Tue Jun 05, 2007 3:11 pm

Write the field to a sequential file using a space as the line terminator, then read this sequential file again using one column and space as the line terminator and you will read in each word as a distinct row, which you can then use to populate your table. For performance you can specify that this temporary file is a named pipe so that no extra I/O is done.

csrazdan · Post by **csrazdan** » Tue Jun 05, 2007 3:16 pm

Questions:

1. You have to insert each word in table as a column or as a row?

2. If columns, them what has to be done if number of words are more than columns?

3. What is delimiter between the words? Is it space?

4. Does the file contain only this field or it has some other column also. In short want to know the foramt of the file.

nsm · Post by **nsm** » Tue Jun 05, 2007 3:28 pm

Thanks Arndw for your response but, I can't see your response as I am not a Paid member. If you can please post the complete message here that will be really helpful for me.

csrazdan,
Input row is space delimited between words.
I need to write each word as a row to the target database. there is another key column in the file.

Thanks
nsm.

ArndW · Post by **ArndW** » Tue Jun 05, 2007 3:38 pm

I kept my first post short enough that I thought it would show in its entirety. Just write your column to a file and use spaces as your line terminators, then read that file using space as a terminator. That way, DS will read each word from your original column as a separate line.

nsm · Post by **nsm** » Tue Jun 05, 2007 3:43 pm

Thanks much Arndw.

csrazdan · Post by **csrazdan** » Tue Jun 05, 2007 3:48 pm

nsm wrote:Thanks Arndw for your response but, I can't see your response as I am not a Paid member. If you can please post the complete message here that will be really helpful for me.

csrazdan,
Input row is space delimited between words.
I need to write each word as a row to the target database. there is another key column in the file.

Thanks
nsm.

I can think of following procedure:

1. You know the max length of the field, 255. So worst case you can have 128 words of 1 char each with space as a delimiter.

2. In sequential stage read this column in 1 field of CHAR 255. Let us call this filed as InputFiled.

3. Next in transformer, create 128 columns from InputFiled using transformation:

Code: Select all

  OutputFiledName --> Field(LinkName.FieldName,"Delimiter ",DelimiterNo)
  
For example:  
     Field1 --> Field(LinkName.InputField," ",1) 
     Field2 --> Field(LinkName.InputField," ",2)
     ......
    Field128 --> Field(LinkName.InputField," ",128)

4. Downstream use Pivot stage to convert Filed1 thru Field128 into rows. Let us call converted column as OutField

5. Next in transformer, put constraint for OutField IS NOT NULL.

6. Insert output of transformer into database.

Hope it helps........

csrazdan · Post by **csrazdan** » Tue Jun 05, 2007 3:52 pm

ArndW wrote:I kept my first post short enough that I thought it would show in its entirety. Just write your column to a file and use spaces as your line terminators, then read that file using space as a terminator. That way, DS will read each word from your original column as a separate line.

This is really good one if file had only 1 column.

But he has another key column in the file. How is he going to maintain relationship with the key column using this approach?

ArndW · Post by **ArndW** » Tue Jun 05, 2007 9:18 pm

csrazdan - I answered the question he posted, there is no mention of a key column made. If there were a key column and a 1-n relationship I would still use the same base method, but write out 2 columns with the key repeating. The derivation for the output file would be:

CONVERT(CATS(REUSE(In.Key):',',CONVERT(In.Data,' ',@VM)),@VM,' ')

The output file metadata has only 1 column, the metadata for that file when read back in would contain 2 columns, the key and the word data. The column separator is a comma.

I reconverted the @VM back to a space so it adheres to the first explanation, it would be more efficient to keep the @VM and use that as the row separator when reading it back in.

kcbland · Post by **kcbland** » Tue Jun 05, 2007 9:44 pm

Change the delimiter in the column to @VM and write to a Hashed file stage. Add an output link from the Hashed file stage and stream to a Sequential stage. In the output link metadata add a name, anything will do, to the Association column. Under the Normalize On drop down box, choose the Association name you just entered.

This functionality will now "Normalize" nested data in a column, propagating repeating values (such as your key column) and creating a unique row for each value in your delimited list column. This normalization feature is quite handy when your data has associated arrays of delimited data across multiple columns (a nested table, if you will).

This functionality has been in the Server product since the beginning, but most people have never ever used it.