Parse data
Moderators: chulett, rschirm, roy
Parse data
Hi All,
Need a help on parsing data
I have a sql database column with 3000 bytes.
Sample data looks like for that single row,
23:05:05: Aaaaaaaaaaaa23:05:05: Bbbbbbbbbbbb23:05:05: BBB: 983001587423:05:05: AAHH: 111123:05:05: XXXXXXXXX23:05:06: YYYYYYYY23:05:06: ZZZZZZZZZZZZ23:05:07: CCCCCCCCCCC23:05:07: SSSSSSSSSSSS23:05:07: QQQQQQQQQQQQQQ23:05:07: XXXXXXXXX: SUMITCHAW23:05:07: AAAAAAAAA23:05:07: TTTTTTTTTTTTTTT23:05:08: AAAAAAAA23:05:08: PPPPPPPPPPPPPPP23:05:08: EEEEEEEEEEE23:05:08: QQQQQQQQQQQQQQQQQQQ23:05:08: QQQQQQQQQ: SumitChaw23:05:08: NNNNNNNNNNNNNNN23:05:08: LLLLLLLLLLLLLL
Now, I want parse the data after the time interval, like,
23:05:05: Aaaaaaaaaaaa
23:05:05: Bbbbbbbbbbbb
23:05:05: BBB: 9830015874
23:05:05: AAHH: 1111
Any thoughts on this.Particularly what to use in transformer. I built another job before which has pipe delimiter, but this is giving trouble.
Appreciate your help.
Need a help on parsing data
I have a sql database column with 3000 bytes.
Sample data looks like for that single row,
23:05:05: Aaaaaaaaaaaa23:05:05: Bbbbbbbbbbbb23:05:05: BBB: 983001587423:05:05: AAHH: 111123:05:05: XXXXXXXXX23:05:06: YYYYYYYY23:05:06: ZZZZZZZZZZZZ23:05:07: CCCCCCCCCCC23:05:07: SSSSSSSSSSSS23:05:07: QQQQQQQQQQQQQQ23:05:07: XXXXXXXXX: SUMITCHAW23:05:07: AAAAAAAAA23:05:07: TTTTTTTTTTTTTTT23:05:08: AAAAAAAA23:05:08: PPPPPPPPPPPPPPP23:05:08: EEEEEEEEEEE23:05:08: QQQQQQQQQQQQQQQQQQQ23:05:08: QQQQQQQQQ: SumitChaw23:05:08: NNNNNNNNNNNNNNN23:05:08: LLLLLLLLLLLLLL
Now, I want parse the data after the time interval, like,
23:05:05: Aaaaaaaaaaaa
23:05:05: Bbbbbbbbbbbb
23:05:05: BBB: 9830015874
23:05:05: AAHH: 1111
Any thoughts on this.Particularly what to use in transformer. I built another job before which has pipe delimiter, but this is giving trouble.
Appreciate your help.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It's probably easiest - assuming an arbitrary number of time/string pairs and arbitrary string lengths - to pre-parse this string into a text file in a before-stage subroutine (making use of MatchField() function), then have your job read from that file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Stage Variable :
dlimit : left(DSLink.COL,2)
dcnt : DCount(DSLink.COL,dlimit) - 1
Loop Condition
Loop While : loopcnt < dcnt
Loop Variable
loopcnt : Initial Value 0 Derivation : loopcnt+1
LoopVarValue:field(DSLink.COL,dlimit,loopcnt+1)[9,len(field(DSLink.COL,dlimit,loopcnt+1))]
LoopVarTime : dlimit:':':left(field(DSLink.COL,dlimit,loopcnt+1),7)
Columns :
COL1 : LoopVarTime
COL2 : Trim(LoopVarValue)
This will work if the time hour per line is the same ie the line has 23 as the hour for all the columns in the row.
If you can put a pipe delimter then it would be much easier. replace 23 by | in the dcnt variable.
you can modify this to use 23:05 as delimter if that is constant in the line so that the possibility of it occuring in the value string diminish.
dlimit : left(DSLink.COL,2)
dcnt : DCount(DSLink.COL,dlimit) - 1
Loop Condition
Loop While : loopcnt < dcnt
Loop Variable
loopcnt : Initial Value 0 Derivation : loopcnt+1
LoopVarValue:field(DSLink.COL,dlimit,loopcnt+1)[9,len(field(DSLink.COL,dlimit,loopcnt+1))]
LoopVarTime : dlimit:':':left(field(DSLink.COL,dlimit,loopcnt+1),7)
Columns :
COL1 : LoopVarTime
COL2 : Trim(LoopVarValue)
This will work if the time hour per line is the same ie the line has 23 as the hour for all the columns in the row.
If you can put a pipe delimter then it would be much easier. replace 23 by | in the dcnt variable.
you can modify this to use 23:05 as delimter if that is constant in the line so that the possibility of it occuring in the value string diminish.
Last edited by ssnegi on Tue Apr 29, 2014 10:43 pm, edited 1 time in total.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
There is no looping capability in the Transformer stage in server jobs.
Further, you can't assume that times will always begin with 23:05. You must learn to create general solutions.
For example, the number of time/string pairs in the string is one quarter of the number of colon-delimited fields assuming no colons in the string values (think about it).
So the time/string pairs could be extracted by counting the number of colons, and creating a loop that starts from 1, has that value as its limit, and an increment of 4.
The Field() function could be used, with a third argument of the loop counter and a fourth argument of 4, to extract a time/string pair. Further Field() functions can be used to separate the time and string for transfer to the output.
Further, you can't assume that times will always begin with 23:05. You must learn to create general solutions.
For example, the number of time/string pairs in the string is one quarter of the number of colon-delimited fields assuming no colons in the string values (think about it).
So the time/string pairs could be extracted by counting the number of colons, and creating a loop that starts from 1, has that value as its limit, and an increment of 4.
The Field() function could be used, with a third argument of the loop counter and a fourth argument of 4, to extract a time/string pair. Further Field() functions can be used to separate the time and string for transfer to the output.
Code: Select all
SUBROUTINE ParseTimeString(InputValue, ErrorCode)
Equate TAB To Char(9)
DEFFUN OpenSequentialFile(FileName,OpenMode,AppendMode,LogMode) Calling "DSU.OpenSequentialFile"
ErrorCode = 0
hInFile = OpenSequentialFile(InputValue, "R", "A", "Y")
hOutFile = OpenSequentialFile(InputValue:".out", "W", "O", "Y")
WriteCount = 0
Loop
While ReadNext Line From hInFile
PairCount = Count(Line, ":")
For Counter = 1 To PairCount Step 4
TimeString = Field(Line, ":", Counter, 4)
TimePortion = Field(TimeString, ":", 1, 3)
StringPortion = Field(TimeString, ":", 4, 1)
OutLine = TimePortion : TAB : StringPortion
WriteSeq OutLine To hOutFile Then WriteCount += 1
Next Counter
Repeat
CloseSeq hOutFile
CloseSeq hInFile
RETURN
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ray, there in one previous job, we have pipe delimiters and two stage variables.
The data looked like,
9999,0,0,20150925125011,201409251259940,,,,,
|9915,0,0,201509254857441,20140925151088,,,,, and so on.
To parse it we used--
StgVar= Change(DSLink1.Column4, "|", "|": DSLink1.Column2:",")
StgVar1=DSLink1.Column2:",": Change(stgVar,"|",CHAR(013):CHAR(010))
We are adding the phone number as column 2 with each parsed row. Can we do anything like that instead of Subroutine for the aforesaid data too. Have to add the ID column infront of each row.
Thanks for your time and help.
3:05:05: Aaaaaaaaaaaa23:05:05: Bbbbbbbbbbbb23:05:05: BBB: 983001587423:05:05: AAHH: 111123:05:05: XXXXXXXXX23:05:06: YYYYYYYY23:05:06: ZZZZZZZZZZZZ23:05:07: CCCCCCCCCCC23:05:07: SSSSSSSSSSSS23:05:07: QQQQQQQQQQQQQQ23:05:07: XXXXXXXXX: SUMITCHAW23:05:07: AAAAAAAAA23:05:07: TTTTTTTTTTTTTTT23:05:08: AAAAAAAA23:05:08: PPPPPPPPPPPPPPP23:05:08: EEEEEEEEEEE23:05:08: QQQQQQQQQQQQQQQQQQQ23:05:08: QQQQQQQQQ: SumitChaw23:05:08: NNNNNNNNNNNNNNN23:05:08: LLLLLLLLLLLLLL
Now, I want parse the data after the time interval, like,
23:05:05: Aaaaaaaaaaaa
23:05:05: Bbbbbbbbbbbb
23:05:05: BBB: 9830015874
23:05:05: AAHH: 1111
The data looked like,
9999,0,0,20150925125011,201409251259940,,,,,
|9915,0,0,201509254857441,20140925151088,,,,, and so on.
To parse it we used--
StgVar= Change(DSLink1.Column4, "|", "|": DSLink1.Column2:",")
StgVar1=DSLink1.Column2:",": Change(stgVar,"|",CHAR(013):CHAR(010))
We are adding the phone number as column 2 with each parsed row. Can we do anything like that instead of Subroutine for the aforesaid data too. Have to add the ID column infront of each row.
Thanks for your time and help.
3:05:05: Aaaaaaaaaaaa23:05:05: Bbbbbbbbbbbb23:05:05: BBB: 983001587423:05:05: AAHH: 111123:05:05: XXXXXXXXX23:05:06: YYYYYYYY23:05:06: ZZZZZZZZZZZZ23:05:07: CCCCCCCCCCC23:05:07: SSSSSSSSSSSS23:05:07: QQQQQQQQQQQQQQ23:05:07: XXXXXXXXX: SUMITCHAW23:05:07: AAAAAAAAA23:05:07: TTTTTTTTTTTTTTT23:05:08: AAAAAAAA23:05:08: PPPPPPPPPPPPPPP23:05:08: EEEEEEEEEEE23:05:08: QQQQQQQQQQQQQQQQQQQ23:05:08: QQQQQQQQQ: SumitChaw23:05:08: NNNNNNNNNNNNNNN23:05:08: LLLLLLLLLLLLLL
Now, I want parse the data after the time interval, like,
23:05:05: Aaaaaaaaaaaa
23:05:05: Bbbbbbbbbbbb
23:05:05: BBB: 9830015874
23:05:05: AAHH: 1111
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
My earlier code assumed (incorrecly) that there was a colon between the string and the next time. The code can be changed to account for that - I leave it as an exercise for the reader, with the hint that you discard that part of the string already processed each iteration through the loop.
As for your new design, have you tried it, and does it produce the expected result? It's not clear to me what's in Column4 and what's in Column2.
As for your new design, have you tried it, and does it produce the expected result? It's not clear to me what's in Column4 and what's in Column2.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Have a question on the code. It was compiling fine but in the code it says sequential file right. But we have the column in database which will be parsed in multiple rows.
Initial stages look like,
ODBC--TRANSFORMER--SEQUENTIAL FILE.
So, if the code is used in transformer it is not about sequential file right. Do we need to change here anything else? any arguments instead of sequential file?
Thanks,
PS-
For the below data.we need to add the call id in front of each parsed row.
3:05:05: Aaaaaaaaaaaa23:05:05: Bbbbbbbbbbbb23:05:05: BBB: 983001587423:05:05: AAHH: 111123:05:05: XXXXXXXXX23:05:06: YYYYYYYY23:05:06: ZZZZZZZZZZZZ23:05:07: CCCCCCCCCCC23:05:07: SSSSSSSSSSSS23:05:07: QQQQQQQQQQQQQQ23:05:07: XXXXXXXXX: SUMITCHAW23:05:07: AAAAAAAAA23:05:07: TTTTTTTTTTTTTTT23:05:08: AAAAAAAA23:05:08: PPPPPPPPPPPPPPP23:05:08: EEEEEEEEEEE23:05:08: QQQQQQQQQQQQQQQQQQQ23:05:08: QQQQQQQQQ: SumitChaw23:05:08: NNNNNNNNNNNNNNN23:05:08: LLLLLLLLLLLLLL
Now, I want parse the data after the time interval, like, (assuming 9162221212 is the call id)
9162221212,23:05:05: Aaaaaaaaaaaa
9162221212,23:05:05: Bbbbbbbbbbbb
9162221212,23:05:05: BBB: 9830015874
9162221212,23:05:05: AAHH: 1111
Initial stages look like,
ODBC--TRANSFORMER--SEQUENTIAL FILE.
So, if the code is used in transformer it is not about sequential file right. Do we need to change here anything else? any arguments instead of sequential file?
Thanks,
PS-
For the below data.we need to add the call id in front of each parsed row.
3:05:05: Aaaaaaaaaaaa23:05:05: Bbbbbbbbbbbb23:05:05: BBB: 983001587423:05:05: AAHH: 111123:05:05: XXXXXXXXX23:05:06: YYYYYYYY23:05:06: ZZZZZZZZZZZZ23:05:07: CCCCCCCCCCC23:05:07: SSSSSSSSSSSS23:05:07: QQQQQQQQQQQQQQ23:05:07: XXXXXXXXX: SUMITCHAW23:05:07: AAAAAAAAA23:05:07: TTTTTTTTTTTTTTT23:05:08: AAAAAAAA23:05:08: PPPPPPPPPPPPPPP23:05:08: EEEEEEEEEEE23:05:08: QQQQQQQQQQQQQQQQQQQ23:05:08: QQQQQQQQQ: SumitChaw23:05:08: NNNNNNNNNNNNNNN23:05:08: LLLLLLLLLLLLLL
Now, I want parse the data after the time interval, like, (assuming 9162221212 is the call id)
9162221212,23:05:05: Aaaaaaaaaaaa
9162221212,23:05:05: Bbbbbbbbbbbb
9162221212,23:05:05: BBB: 9830015874
9162221212,23:05:05: AAHH: 1111
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Sequential File stage is a passive stage, and does not do any work (apart from formatting columns) on the data. Transformer stage is an active stage and is where the work (for example adding the call ID) must be done.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ray, I think the confusion is around the "SUBROUTINE ParseTimeString" that you posted. It writes its output to a sequential file and Sam is looking to leverage it in a Transformer derivation, thinking it should pass the parsed data back to the caller.
At least that's what I think I'm reading.
At least that's what I think I'm reading.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Fair enough. Then the job could write untransformed data into the flat file that the subroutine processes, and the routine could be invokved as a before-stage subroutine in the second Transformer stage.
Code: Select all
ODBC ----> Transformer ----> SeqFile ----> Transformer ----> SeqFile
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Code: Select all
SUBROUTINE ParseTimeString(InputValue, ErrorCode)
Equate TAB To Char(9)
DEFFUN OpenSequentialFile(FileName,OpenMode,AppendMode,LogMode) Calling "DSU.OpenSequentialFile"
ErrorCode = 0
hInFile = OpenSequentialFile(InputValue, "R", "A", "Y")
hOutFile = OpenSequentialFile(InputValue:".out", "W", "O", "Y")
WriteCount = 0
Loop
While ReadNext Line From hInFile
Loop
Pair = Field(Line, ":", 4, 1)
Pair = Left(Pair, Len(Pair) - 3)
Time = Field(Pair, ":", 1, 3)
Rest = Field(Pair, ":", 4, 1)
WriteSeq Time : ": " : TAB : Rest To hOutFile Then WriteCount += 1
Line = Right(Line, Len(Line) - Len(Pair))
While Len(Line) > 0
Repeat
Repeat
CloseSeq hOutFile
CloseSeq hInFile
RETURN
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Thanks... didn't want to put words in your mouth and knew you could explain how you saw this playing out for those of us a little behind the curve. That and I didn't have any kind of time for a proper reply.ray.wurlod wrote:Fair enough.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers