Create multiple Records from single record

shivaprasad · Post by **shivaprasad** » Fri Oct 21, 2005 11:06 am

Hi All Need a urgent help.
I have records like this

33979|21014589|1536|06/13/2005|V00020|2.1|400|400|050903010013534923|050903010013535322

Now i want it like this
33979|21014589|1536|06/13/2005|V00020|2.1|400|400|050903010013534923
33979|21014589|1536|06/13/2005|V00020|2.1|400|400|050903010013534924
33979|21014589|1536|06/13/2005|V00020|2.1|400|400|050903010013534925
And increment will go on till the last range reaches that means 050903010013535322

And then i will have one more records like

33979|21014589|1536|06/13/2005|V00022|4.1|100|100|050303010008325718|050303010008325817
Then again same above steps have to repeated.

Now my question is how to do duplicate rows taking last two columns in that first column as starting point and then increment the no with all other columns same as it is and till it gets the last column no.

Hope every body will get clear cut idea about the problem now.

So can any body help me.
I need it urgently
Thanks in advance
Shiv

roy · Post by **roy** » Fri Oct 21, 2005 11:17 am

Hi,
There is a nice FAQ topic covering this here it is.

(1700)
IHTH,

nag0143 · Post by **nag0143** » Fri Oct 21, 2005 11:47 am

search for looping a record. you will get nice examples.

this is tony's idea, it will work for you with some additions according to your requirement.

Another thing you might consider:

Write the data to a hash file. Add a unique key field (use @Outrownum as the value) and another field, F2. The derivation for F2 is a function (pass in Count as a parameter) (that you have to write) that returns a string of values from 1 to count, separated by @VM chars. When reading from this hash file, in the metadata for F2, set "Type" to "MV" (Association should go to F2), then, at the top, set "Normalize On" to F2. Now when you read the data from the hash file you'll get one row for each value in the F2 field. You can drop the key and F2 after reading from the hash file, if you so desire.

Quick, easy code and easy to implement.

Thanks
Nag

ray.wurlod · Post by **ray.wurlod** » Fri Oct 21, 2005 5:30 pm

We don't do "urgent". You have been posting here long enough to know that.

shivaprasad · Post by **shivaprasad** » Fri Oct 21, 2005 9:22 pm

Urgent means i need it badly sir,
hope you help me, i have monday a delivery for this.
So pls help me.
Thanks
shiv

ray.wurlod wrote:We don't do "urgent". You have been posting here long enough to know that.

shivaprasad · Post by **shivaprasad** » Fri Oct 21, 2005 9:26 pm

THis doesnt work sir, since my ranges are dynamic and based on last two fields. last but one field is starting range and last field is ending no.
now when u generate the nos with those two fields as ranges u will get all other fields dumplicated....

So hope you are clear now.
Thanks
Shiv

nag0143 wrote:search for looping a record. you will get nice examples.

this is tony's idea, it will work for you with some additions according to your requirement.

Another thing you might consider:

Write the data to a hash file. Add a unique key field (use @Outrownum as the value) and another field, F2. The derivation for F2 is a function (pass in Count as a parameter) (that you have to write) that returns a string of values from 1 to count, separated by @VM chars. When reading from this hash file, in the metadata for F2, set "Type" to "MV" (Association should go to F2), then, at the top, set "Normalize On" to F2. Now when you read the data from the hash file you'll get one row for each value in the F2 field. You can drop the key and F2 after reading from the hash file, if you so desire.

Quick, easy code and easy to implement.

Thanks
Nag

kcbland · Post by **kcbland** » Fri Oct 21, 2005 10:16 pm

First things first. Your ability to supply a result is your responsibility. If you want fast and accurate answers, consider the price you are paying. $0 US. That's pretty cheap. We don't bill for time or answers, we just ask that you don't request URGENCY, because we have lives too.

Plus, as is the case sometimes, the URGENCY comes from someone who is unqualified for the contract position they are filling which came at the expense of a better qualified person who was asking for higher compensation. The URGENCY tends to mean that we on this forum have to help the less qualified person avoid being tossed out on their rear from the customer site when the customer realizes they are ill-prepared to do the work.

Now, to your problem.

PURE DS SOLUTION:

Your last two columns indicate two integer values that are the starting and ending range. You will need to write a simple function that is passed three arguments. The first argument is the entire row of data as you need it duplicated across the repeating rows. The second argument is the beginning range, the third argument the ending range.

Now, in this function, initialize the Ans variable to blank, that is "". You will start a loop from your beginning range to your ending range. For each pass thru the loop, append to the Ans variable the argument holding the entire row, your delimiter, the loop variable, your delimiter, and the end variable. If you are not on the last pass thru the loop, also append the linefeed (and carriage-return if doing this for DOS files) character.

When this function is finished, it will have populated the Ans variable with all of the expanded rows, delimited by linefeed(+carriage return). In your DS transformer stage, call this function, AND THIS IS IMPORTANT, output the result as the only column to a Sequential file stage. Now read that Sequential file with an output link and continue processing. You will see that when the job runs, N number of rows will process to the Sequential stage, but when reading from the Sequential file N + M rows will read out.

The magic here is that the "row" written to the Sequential file is actually a big text string, but because the string contains LF+CR, it will physically become separate rows in the text file. DO NOT TRY TO CONTINUE TO PROCESS THE ROW WITHOUT FIRST WRITING TO A FILE, that's the trick.

DATABASE SOLUTION:

Bulk load your data into a relational database work table. Create another work table holding a single integer column and populate that table with values 0 to as large a range between your beginning and ending values as you will expect. Maybe be safe and use 1 million. Something like "create table numtable (x integer)".

Join the two tables together, but use sql like this:

select a,b,c,d,e,range1+numtable.x,range2 from datatable, numtable
where datatable.range1+numtable.x <= datatable.range2

This should give you a product of the two tables, returning all rows from the datatable, as well as a cartesian product result row where range1 plus that x value is less than or equal to the range 2 value.

One more aside, these are tricks of the data integration trade. If you have an urgent problem in the future, we can provide premium support that includes a phone number, or even us actually doing the work, but it comes at a price.

elavenil · Post by **elavenil** » Fri Oct 21, 2005 11:25 pm

Hi Shiv,

This can be achieved reading the input file (hope this input can be written into seq file) in the job control code and reading the records in the loop.

OpenSeq/ReadSeq/WriteSeq would help you reading/writing from/into the file. And last value to be incremented till you reach the target range.

HTWH.

Regards
Saravanan

ray.wurlod · Post by **ray.wurlod** » Sat Oct 22, 2005 7:18 pm

Your range values are too large for integer arithmetic, and the leading zeroes need to be preserved. Search the forum for the code for my OpenSequentialFile routine and implement same. Then pre-process your file with the following before-stage subroutine. In its input value field supply the pathname of the original file and the pathname of another file that DataStage will subsequently process.

Code: Select all

SUBROUTINE GenerateMultipleRecords(InputArg, ErrorCode)
$COPYRIGHT "Copyright (c) 2005, Ray Wurlod.  All rights reserved.  May be used with this copyright notice intact."
* Generates N rows per input row, depending on values in final two fields.

DEFFUN OpenSequentialFile(FilePath, OpenMode, WriteMode, Logging) Calling "DSU.OpenSequentialFile"

Equate RoutineName To "GenerateMultipleRows"
Equate FieldDelimiter To "|"

ErrorCode = 0 ; * set this to a non-zero value to halt execution

ReadCount = 0
WriteCount = 0
ErrorCount = 0

* Process InputArg to determine file pathnames

Msg = ""
TrimInputArg = Trim(InputArg)
InputFilePath = Field(TrimInputArg, " ", 1, 1)
OutputFilePath = Field(TrimInputArg, " ", 2, 1)

If Len(InputFilePath) = 0 
Then
   Msg<-1> = "No input file path supplied."
End
If Len(OutputFilePath) = 0
Then
   Msg<-1> = "No output file path supplied."
End

If Len(Msg) > 0
Then
   Call DSLogWarn(Msg, RoutineName)
   ErrorCode = 1  ; * halt execution of job on return
   RETURN
End


* Open both files, one for reading, one for (over)writing.

fvInputFile = OpenSequentialFile(InputFilePath, "R", "A", "Y")
fvOutputFile = OpenSequentialFile(OutputFilePath, "W", "O", "Y")
If FileInfo(fvInputFile,0) And FileInfo(fvOutputFile,0)
Then

   * Outer loop reads input file one line at a time.

   Loop
   While ReadSeq InputLine From fvInputFile

      ReadCount += 1

      * Parse generating values from input line.

      Delimiters = Count(InputLine, FieldDelimiter )
      ConstantFields = Field(Inputline, FieldDelimiter , 1, Delimiters - 1)
      StartValue = Field(InputLine, FieldDelimiter , Delimiters, 1)
      FinalValue = Field(InputLine, FieldDelimiter , Delimiters + 1, 1)

      * Make sure each is numeric.

      If Not(Num(StartValue) And Num(FinalValue))
      Then
          Call DSLogWarn("Non-numeric values for loop in line " : ReadCount " :.  Row skipped." : @FM : InputLine, RoutineName)
          ErrorCount += 1
          Exit   ; * exit from this iteration of outer (read) loop.
      End

      If FinalValue >= StartValue
      Then

         * Inner loop generates output lines.

         FinalFieldValue = StartValue

         Loop
         While FinalFieldValue <= FinalValue

            * Generate and write line to output file.

            Line = ConstantFields : FieldDelimiter : FinalFieldValue
            WriteSeq Line To fvOutputFile
            On Error
                Call DSLogWarn("Error writing to " : OutputFilePath : ", status = " : Status(), RoutineName) 
                ErrorCode = 2
                ErrorCount += 1
                GoTo MainExit  ; * no point continuing
            End
            Then
               WriteCount += 1
            End
            Else
                Call DSLogWarn("Unable to write to " : OutputFilePath : ", status = " : Status(), RoutineName) 
                ErrorCode = 2
                ErrorCount += 1
                GoTo MainExit  ; * no point continuing
            End

            * Use string arithmetic to preserve precision, Fmt() to preserve leading zero(es).

            FinalFieldValue = Fmt(SAdd(FinalFieldValue, "1"), "R%":Len(FinalFieldValue))

         Repeat

      End
      Else

         Call DSLogWarn("Final value less than start value in line " : ReadCount : @FM : InputLine, RoutineName)
         ErrorCount += 1

      End

   Repeat

MainExit:

   * Pass back counters through system variables
   @USER0 = ReadCount
   @USER1 = WriteCount
   @USER2 = ErrorCount

   If FileInfo(fvOutputFile, 0) Then CloseSeq fvOutputFile
   If FileInfo(fvInputFile, 0) Then CloseSeq fvInputFile

End

RETURN

trammohan · Post by **trammohan** » Sun Oct 23, 2005 10:31 am

Hi,
You can create multiple records with the help of Custom build OPD..
in the opd call Buildrecord() and 'C' control structures if and else

ram

chulett · Post by **chulett** » Sun Oct 23, 2005 11:19 am

In a Server job?

Please make sure you don't give PX answers when Server questions are being asked.

kcbland · Post by **kcbland** » Sun Oct 23, 2005 8:08 pm

shivaprasad wrote:Urgent means i need it badly sir,
hope you help me, i have monday a delivery for this.
So pls help me.
Thanks

So, did you get it working?????

Olap · Post by **Olap** » Mon Oct 24, 2005 2:09 am

kcbland wrote:
shivaprasad wrote:Urgent means i need it badly sir,
hope you help me, i have monday a delivery for this.
So pls help me.
Thanks
So, did you get it working?????

In your previous message, you have told that the records can be read in a single row using the routine and process that into a file. But I've one question on this.

1. Lets say there are 4 cols. AA|BB|1|5
Once we process the record into a single row using Char(10) - line feed, it appears like this "AA|BB|1
AA|BB2
AA|BB|3
AA|BB|4
AA|BB|5"

2. How do we read this into another file with only 3 cols as different rows.

Hope the question is clear??

ray.wurlod · Post by **ray.wurlod** » Mon Oct 24, 2005 2:25 am

Welcome aboard. :D

Re-read the thread from the top, where the original line had ten columns of which the final two delimited fields specified what should occur in the output.
Your question substitutes four for ten and can be solved in exactly the same ways. Not even any requirement to change the routine (though with such small numbers you could get by with integer arithmetic rather than needing the string math function SAdd()).

Olap. That's a Scandinavian name, no?

Please edit your Profile and let us know a little about yourself.

kcbland · Post by **kcbland** » Mon Oct 24, 2005 7:16 am

Olap wrote: 2. How do we read this into another file with only 3 cols as different rows.

The job looks like this:

Code: Select all

SEQ#1 --> XFM#1 w/routine --> SEQ#2 --> XFM#2 --> whatever

The metadata in SEQ#1 is a single column. The XFM calls the routine, passing it the entire row which is just defined as a single column. SEQ#2 is the same single column metadata on the input side. The output link actually now has all of the columns defined.

The first sequential file has to make the entire row available. The XFM#1 parses the row using FIELD(inlink.col, "|", #) functions where # is the element in the delimited col that contains the data you need to pass to the function. Since the last two columns are the ranging values, you can get them with the FIELD function.

A single hour of experimentation should be all that's required to get this to work. This is a rather simple task in DataStage.