DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
Marley777



Group memberships:
Premium Members

Joined: 27 Jan 2004
Posts: 311

Points: 2763

Post Posted: Tue Jan 29, 2008 2:18 pm Reply with quote    Back to top    

DataStage® Release: 7x
Job Type: Parallel
OS: Unix
Additional info: AIX 5.3
Confused

Hi, trying to think of a way to look for the occurrence of something. If I find DBA I want to split everything after DBA into a seperate record and get rid of any spaces after DBA.

eample

I have one field that contains

BOB MARLEY DBA JAMAICA ISLAND TOURS

but need to turn it into this (2 records). Spaces are also gone.

BOB MARLEY
JAMAICA ISLAND TOURS

Thanks for reading this.
DSguru2B

Premium Poster


since February 2006

Group memberships:
Premium Members, Heartland Usergroup

Joined: 09 Feb 2005
Posts: 6854
Location: Houston, TX
Points: 35666

Post Posted: Tue Jan 29, 2008 2:25 pm Reply with quote    Back to top    

What is the metadata of your input, is it one record, one column or multiple records?

_________________
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Rate this response:  
Not yet rated
Marley777



Group memberships:
Premium Members

Joined: 27 Jan 2004
Posts: 311

Points: 2763

Post Posted: Tue Jan 29, 2008 2:36 pm Reply with quote    Back to top    

Hi,

I have lots of records on one file that I'm reading into a transformer. Based on one column one record can become two records. We want to seperate names. If 2 records had DBA somewhere in the name field we would end up with 4 records.

BOB MARLEY DBA JAMAICA ISLAND TOURS
ZIGGY MARLEY DBA ZIGGY'S HERB SHOP

would become
BOB MARLEY
JAMAICA ISLAND TOURS
ZIGGY MARLEY
ZIGGY'S HERB SHOP

4 records. The name field has been split. The rest of the information on the records repeat, but the names will be different.

There can also be space that I need to get rid of
BOB MARLEY DBA <spaces> JAMAICA ISLAND TOURS
Rate this response:  
Not yet rated
DSguru2B

Premium Poster


since February 2006

Group memberships:
Premium Members, Heartland Usergroup

Joined: 09 Feb 2005
Posts: 6854
Location: Houston, TX
Points: 35666

Post Posted: Tue Jan 29, 2008 2:38 pm Reply with quote    Back to top    

Well, if I were you, I would do it in a shell script using a sed and replace DBA with leading and trailing space with a new line character.
In PX it becomes a little tricky. You need to EREPLACE the " DBA " with a new line character. For that you need to compile and link the PxEreplace() function which is not provided by Ascential (search for its source code here) and then run your stuff.
Id say do it in a server job or a shell script.

_________________
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 51785
Location: Sydney, Australia
Points: 281164

Post Posted: Tue Jan 29, 2008 9:42 pm Reply with quote    Back to top    

If you must use parallel expressions in Transformer stage, use Index() to determine where "DBA" occurs, and substrings (possibly with Trim()) to extract what's either side of that. Putting the ...

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
currently hiring: Sydney and Melbourne
Rate this response:  
thiruiitd
Participant



Joined: 09 Mar 2006
Posts: 4
Location: Bangalore
Points: 37

Post Posted: Tue Jan 29, 2008 9:50 pm Reply with quote    Back to top    

Hi,

The solution given is for datastage.

To get delimited substring use
Field (string, delimiter, instance [, number])

StageVariableName1 Trim (Field (DSLink.Name, “DBA”, 1, 1))

StageVariableName2 Trim (Field (DSLink.Name, “DBA”, 2, 1))

StageVariableName3 Trim (Field (DSLink.Name, “DBA”, 3, 1))

StageVariableName4 Trim (Field (DSLink.Name, “DBA”, 4, 1))

Output link constraints

Output link 1 StageVariableName1<>NULL - This may not required
Output link 2 StageVariableName2<>NULL
Output link 3 StageVariableName3<>NULL
Output link 4 StageVariableName4<>NULL

Mapping for output link1
Column1 DSLink.Column1
Column2 DSLink.Column2
Column3 DSLink.Column3
Column4 DSLink.Column4
Column5 DSLink.Column5
Column6 DSLink.Column6
Name StageVariableName1

Mapping for output link2
Column1 DSLink.Column1
Column2 DSLink.Column2
Column3 DSLink.Column3
Column4 DSLink.Column4
Column5 DSLink.Column5
Column6 DSLink.Column6
Name StageVariableName2

Mapping for output link3
Column1 DSLink.Column1
Column2 DSLink.Column2
Column3 DSLink.Column3
Column4 DSLink.Column4
Column5 DSLink.Column5
Column6 DSLink.Column6
Name StageVariableName3

Mapping for output link4
Column1 DSLink.Column1
Column2 DSLink.Column2
Column3 DSLink.Column3
Column4 DSLink.Column4
Column5 DSLink.Column5
Column6 DSLink.Column6
Name StageVariableName4


Use the funnel stage or link collector stage to add all the records from four output links.

I hope this is one of the solutions to this problem.
This is limited to four possible substrings.

Regards
Thirunavukkarasu

RStone wrote:
Confused

Hi, trying to think of a way to look for the occurrence of something. If I find DBA I want to split everything after DBA into a seperate record and get rid of any spaces after DBA.

eample

I have one field that contains

BOB MARLEY DBA JAMAICA ISLAND TOURS

but need to turn it into this (2 records). Spaces are also gone.

BOB MARLEY
JAMAICA ISLAND TOURS

Thanks for reading this.

_________________
Thirunavukkarasu
Bangalore
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 51785
Location: Sydney, Australia
Points: 281164

Post Posted: Wed Jan 30, 2008 1:06 am Reply with quote    Back to top    

Unfortunately for Thirunavukkarasu's idea, the Field() function only uses a single-character delimiter - if more than one is supplied then only the first character in the string is used. So it wouldn ...

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
currently hiring: Sydney and Melbourne
Rate this response:  
DSguru2B

Premium Poster


since February 2006

Group memberships:
Premium Members, Heartland Usergroup

Joined: 09 Feb 2005
Posts: 6854
Location: Houston, TX
Points: 35666

Post Posted: Wed Jan 30, 2008 2:00 pm Reply with quote    Back to top    

Good solutions. But will only work if DBA occurs only once in a row. For multiple occurances of DBA, it wont work. Or the result set needs to be reprocessed again through the same logic until no literal "DBA" is found in a row.

_________________
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
Rate this response:  
ras1
Participant



Joined: 17 Oct 2007
Posts: 1

Points: 5

Post Posted: Wed Jan 30, 2008 2:19 pm Reply with quote    Back to top    

DBA will only appear once. I'm trying the ideas you all have given me. I will let you know which one works. Thanks for reading and thanks for your help. Feel free to post any new ideas. Thanks!!!
Rate this response:  
balajisr


since October 2006

Group memberships:
Premium Members

Joined: 28 Jul 2005
Posts: 785

Points: 3770

Post Posted: Wed Jan 30, 2008 11:41 pm Reply with quote    Back to top    

Quote:
Unfortunately for Thirunavukkarasu's idea, the Field() function only uses a single-character delimiter - if more than one is supplied then only the first character in the string is used.


Delimiter for a field function can be a string and it works. I am not sure why Ray advises against using this. Confused
Rate this response:  
Not yet rated
Marley777



Group memberships:
Premium Members

Joined: 27 Jan 2004
Posts: 311

Points: 2763

Post Posted: Thu Jan 31, 2008 11:49 am Reply with quote    Back to top    

Hi Ray, I looked at your example. Have a question. How can I expand on you logic to handle the following scenario.


I can have more than just 'DBA'. I can have 'C/O' or 'T/A' etc... about ten possibilities. Whenever I encounter one of these ten values the field must be parsed into two fields. I'm hoping I can use stage variables to interrogate the incoming data to see if it needs to be parsed or not. I will have to holding fields for the parsed name.

TOM BRADY SPORTS APPAREL - stays as is, no need to parse because there is no 'DBA' or 'C/O' etc...in the name.

CARSON PALMER INC C/O MIKE VERNON - name should split into two fields then move to output file as separate name records.
CARSON PALMER INC
MIKE VERNON

BOB MARLEY DBA JAMAICA ISLAND TOURS - name should split
BOB MARLEY
JAMAICA ISLAND TOURS

ZIGGY MARLEY T/A ZIGGY'S HERB SHOP - name should split
ZIGGY MARLEY
ZIGGY'S HERB SHOP
Rate this response:  
Not yet rated
Marley777



Group memberships:
Premium Members

Joined: 27 Jan 2004
Posts: 311

Points: 2763

Post Posted: Thu Jan 31, 2008 11:51 am Reply with quote    Back to top    

Hi Ray, I looked at your example. Have a question. How can I expand on you logic to handle the following scenario.


I can have more than just 'DBA'. I can have 'C/O' or 'T/A' etc... about ten possibilities. Whenever I encounter one of these ten values the field must be parsed into two fields. I'm hoping I can use stage variables to interrogate the incoming data to see if it needs to be parsed or not. I will have to holding fields for the parsed name.

TOM BRADY SPORTS APPAREL - stays as is, no need to parse because there is no 'DBA' or 'C/O' etc...in the name.

CARSON PALMER INC C/O MIKE VERNON - name should split into two fields then move to output file as separate name records. I need to also drop C/O.
CARSON PALMER INC
MIKE VERNON

BOB MARLEY DBA JAMAICA ISLAND TOURS - name should split. Also drop DBA.
BOB MARLEY
JAMAICA ISLAND TOURS

ZIGGY MARLEY T/A ZIGGY'S HERB SHOP - name should split. Also drop T/A.
ZIGGY MARLEY
ZIGGY'S HERB SHOP

Thanks for your help.
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 51785
Location: Sydney, Australia
Points: 281164

Post Posted: Thu Jan 31, 2008 4:32 pm Reply with quote    Back to top    

Assuming that only one of these will occur in any one row, change the derivation of sv0 to Code: sv0 --> Index(InLink.TheString, "C/O", 1) + Index(InLink.TheString, &qu ...

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
currently hiring: Sydney and Melbourne
Rate this response:  
Marley777



Group memberships:
Premium Members

Joined: 27 Jan 2004
Posts: 311

Points: 2763

Post Posted: Fri Feb 22, 2008 1:10 pm Reply with quote    Back to top    

Hi, thanks for your help so far. Everything seems to be working. However there is a new piece to the puzzel. I have been given a spreadsheet that contains hundreds of values (BANK, AGENCY, AGENCY, ACADEMY). I have to know if the name I'm parsing cotains any of these values.

I know I can use
Index(clientData.FULL_NAME, "C/O DBA", 1)

But would have to code this hundreds of times.

my question is- How can I get the spread sheet containing hundreds of values into DS and how can I check the name filed on every record to see if the name field contains anything that is listed in the spread sheet?

Thnaks for your help.
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 51785
Location: Sydney, Australia
Points: 281164

Post Posted: Fri Feb 22, 2008 5:30 pm Reply with quote    Back to top    

You could load all the searchable strings into a key-only UV table (= hashed file) and use user-defined SQL Code: SELECT key FROM table WHERE EVAL "Index(?, key, 1)" > 0; ...

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
currently hiring: Sydney and Melbourne
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours