Parsing of Data

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
sam334
Premium Member
Premium Member
Posts: 145
Joined: Mon Aug 26, 2013 7:42 pm

Parsing of Data

Post by sam334 »

Hi all,
I have posted some part of this topic before.Not sure how to go with parsing of data. I have a single column Varchar(2000). I want to parse the row into multiple rows. Column looks like,

12:10:07: Dialing12:10:07: Connecting12:10:07: ABC: abc:9433769781$100.88.77.0:878712:10:07: ABCD: 000012:10:07: Agent Initializing12:10:18: On Call12:10:18: Assigned to operator12:10:18: Waiting for Supervisor12:10:18: Waiting for Manager12:11:13: Call Ended12:11:13: Call Not connected

Now, I want to parse the data like, in multiple rows. This is an example for a single time. So, time is not constant.

12:10:07: Dialing
12:10:07: Connecting
12:10:07: ABC: abc:9433769782$100.88.77.0:8787
12:10:07: ABCD: 0000
12:10:07: Agent Initializing
12:10:18: On Call
12:10:18: Assigned to operator
12:10:18: Waiting for Supervisor
12:10:18: Waiting for Manager
12:11:13: Call Ended
12:11:13: Call Not connected

The problem is the identifier.We have to probably tell datastage whenever you see any integer:integer:integer: then break it as a separate.

Any thought how we could achieve this....
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Sounds like an ideal candidate for looping in a Transformer stage. Search for the next nn:nn:nn pattern, which signals the start of the next line. Or maybe search for the next colon character and parse around that.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

This is a design issue on the source data. They should provide it to you with a clear delimiter, even just one space, showing the end of the text message for the timestamp. That way, if the text message ends with a numeric, you aren't forced to parse the data with byte-level edits.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
sam334
Premium Member
Premium Member
Posts: 145
Joined: Mon Aug 26, 2013 7:42 pm

Post by sam334 »

That's true. We wanted to have atleast one delimiter. But the source table is a stand alone table so they cant change it. I am trying to break it in SQL server using common table expression. Lets see if it works.
sam334
Premium Member
Premium Member
Posts: 145
Joined: Mon Aug 26, 2013 7:42 pm

Post by sam334 »

One way I found is, in server job sequential file stage there is a UNIX style (LF) Line termination. Which eventually breaking it to multiples lines.

So, from SEQ-Transform-Seq-Tranform-Seq,-- I can break one line to multiple.
sam334
Premium Member
Premium Member
Posts: 145
Joined: Mon Aug 26, 2013 7:42 pm

Post by sam334 »

All,
Have one more question. Can we convert a :SPACE delimiter to just pipe or any other delimiter. I am trying to use convert function in transformer, but not sure if COLONSPACE together can be converted with other delimiter.

As an example,
12:10:07: Dialing
12:10:07: Connecting

I want to replace : space after 12:10:07 by a pipeline or any other delimiter.

Thanks,
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No, Convert supports single character substitutions. You'd need a version of EReplace for that.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sam334
Premium Member
Premium Member
Posts: 145
Joined: Mon Aug 26, 2013 7:42 pm

Post by sam334 »

OK.Thanks Craig.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So, use a Server job or from what I've read there is a PX version somewhere in the 9.x release. There's also a pxEReplace posted here which may or may not be problematical, an exact search should turn it (and the discussion of it) up.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sam334
Premium Member
Premium Member
Posts: 145
Joined: Mon Aug 26, 2013 7:42 pm

Post by sam334 »

Craig, I found a previous post with the code. Trying it. Thanks.
sam334
Premium Member
Premium Member
Posts: 145
Joined: Mon Aug 26, 2013 7:42 pm

Post by sam334 »

ray.wurlod wrote:Sounds like an ideal candidate for looping in a Transformer stage. Search for the next nn:nn:nn pattern, which signals the start of the next line. Or maybe search for the next colon character and parse around that.
Any help on looping structure of the data? I am kind of confused on how to break it when it comes to nn:nn:nn.

Thanks a lot.
Post Reply