DSXchange

Posted: **Tue Jan 29, 2008 2:18 pm**

Hi, trying to think of a way to look for the occurrence of something. If I find DBA I want to split everything after DBA into a seperate record and get rid of any spaces after DBA.

eample

I have one field that contains

BOB MARLEY DBA JAMAICA ISLAND TOURS

but need to turn it into this (2 records). Spaces are also gone.

BOB MARLEY
JAMAICA ISLAND TOURS

Thanks for reading this.

Posted: **Tue Jan 29, 2008 2:25 pm**

What is the metadata of your input, is it one record, one column or multiple records?

Posted: **Tue Jan 29, 2008 2:36 pm**

Hi,

I have lots of records on one file that I'm reading into a transformer. Based on one column one record can become two records. We want to seperate names. If 2 records had DBA somewhere in the name field we would end up with 4 records.

BOB MARLEY DBA JAMAICA ISLAND TOURS
ZIGGY MARLEY DBA ZIGGY'S HERB SHOP

would become
BOB MARLEY
JAMAICA ISLAND TOURS
ZIGGY MARLEY
ZIGGY'S HERB SHOP

4 records. The name field has been split. The rest of the information on the records repeat, but the names will be different.

There can also be space that I need to get rid of
BOB MARLEY DBA <spaces> JAMAICA ISLAND TOURS

Posted: **Tue Jan 29, 2008 2:38 pm**

Well, if I were you, I would do it in a shell script using a sed and replace DBA with leading and trailing space with a new line character.
In PX it becomes a little tricky. You need to EREPLACE the " DBA " with a new line character. For that you need to compile and link the PxEreplace() function which is not provided by Ascential (search for its source code here) and then run your stuff.
Id say do it in a server job or a shell script.

Posted: **Tue Jan 29, 2008 9:42 pm**

If you must use parallel expressions in Transformer stage, use Index() to determine where "DBA" occurs, and substrings (possibly with Trim()) to extract what's either side of that.

Putting the Index() function in a stage variable is best, because you need that result more than once, but don't need to calculate it more than once.

Posted: **Wed Jan 30, 2008 1:06 am**

Unfortunately for Thirunavukkarasu's idea, the Field() function only uses a single-character delimiter - if more than one is supplied then only the first character in the string is used. So it wouldn't work for
LEONARD COHEN DBA BIRD ON THE WIRE

Try

Code: Select all

sv0 --> Index(InLink.TheString, "DBA", 1)
sv1 --> InLink.TheString[1, sv0 - 1]
sv2 --> InLink.TheString[sv0 + 3, Len(InLink.TheString) - sv0 - 2]

Trim(sv1)
Trim(sv2)

Posted: **Wed Jan 30, 2008 2:00 pm**

Good solutions. But will only work if DBA occurs only once in a row. For multiple occurances of DBA, it wont work. Or the result set needs to be reprocessed again through the same logic until no literal "DBA" is found in a row.

Posted: **Wed Jan 30, 2008 2:19 pm**

DBA will only appear once. I'm trying the ideas you all have given me. I will let you know which one works. Thanks for reading and thanks for your help. Feel free to post any new ideas. Thanks!!!

Posted: **Wed Jan 30, 2008 11:41 pm**

Unfortunately for Thirunavukkarasu's idea, the Field() function only uses a single-character delimiter - if more than one is supplied then only the first character in the string is used.

Delimiter for a field function can be a string and it works. I am not sure why Ray advises against using this.

Posted: **Thu Jan 31, 2008 11:49 am**

Hi Ray, I looked at your example. Have a question. How can I expand on you logic to handle the following scenario.

I can have more than just 'DBA'. I can have 'C/O' or 'T/A' etc... about ten possibilities. Whenever I encounter one of these ten values the field must be parsed into two fields. I'm hoping I can use stage variables to interrogate the incoming data to see if it needs to be parsed or not. I will have to holding fields for the parsed name.

TOM BRADY SPORTS APPAREL - stays as is, no need to parse because there is no 'DBA' or 'C/O' etc...in the name.

CARSON PALMER INC C/O MIKE VERNON - name should split into two fields then move to output file as separate name records.
CARSON PALMER INC
MIKE VERNON

BOB MARLEY DBA JAMAICA ISLAND TOURS - name should split
BOB MARLEY
JAMAICA ISLAND TOURS

ZIGGY MARLEY T/A ZIGGY'S HERB SHOP - name should split
ZIGGY MARLEY
ZIGGY'S HERB SHOP

Posted: **Thu Jan 31, 2008 11:51 am**

Hi Ray, I looked at your example. Have a question. How can I expand on you logic to handle the following scenario.

I can have more than just 'DBA'. I can have 'C/O' or 'T/A' etc... about ten possibilities. Whenever I encounter one of these ten values the field must be parsed into two fields. I'm hoping I can use stage variables to interrogate the incoming data to see if it needs to be parsed or not. I will have to holding fields for the parsed name.

TOM BRADY SPORTS APPAREL - stays as is, no need to parse because there is no 'DBA' or 'C/O' etc...in the name.

CARSON PALMER INC C/O MIKE VERNON - name should split into two fields then move to output file as separate name records. I need to also drop C/O.
CARSON PALMER INC
MIKE VERNON

BOB MARLEY DBA JAMAICA ISLAND TOURS - name should split. Also drop DBA.
BOB MARLEY
JAMAICA ISLAND TOURS

ZIGGY MARLEY T/A ZIGGY'S HERB SHOP - name should split. Also drop T/A.
ZIGGY MARLEY
ZIGGY'S HERB SHOP

Thanks for your help.

Posted: **Thu Jan 31, 2008 4:32 pm**

Assuming that only one of these will occur in any one row, change the derivation of sv0 to

Code: Select all

sv0 --> Index(InLink.TheString, "C/O", 1) + Index(InLink.TheString, "DBA", 1) + Index(InLink.TheString, "T/A", 1)

et cetera as required.
The Index() function returns 0 if the substring is not found.
The other two stage variable derivation expressions ought not to need to be changed.

Posted: **Fri Feb 22, 2008 1:10 pm**

Hi, thanks for your help so far. Everything seems to be working. However there is a new piece to the puzzel. I have been given a spreadsheet that contains hundreds of values (BANK, AGENCY, AGENCY, ACADEMY). I have to know if the name I'm parsing cotains any of these values.

I know I can use
Index(clientData.FULL_NAME, "C/O DBA", 1)

But would have to code this hundreds of times.

my question is- How can I get the spread sheet containing hundreds of values into DS and how can I check the name filed on every record to see if the name field contains anything that is listed in the spread sheet?

Thnaks for your help.

Posted: **Fri Feb 22, 2008 5:30 pm**

You could load all the searchable strings into a key-only UV table (= hashed file) and use user-defined SQL

Code: Select all

SELECT key FROM table WHERE EVAL "Index(?, key, 1)" > 0;

It's ugly, it's slow, but it will work. I can't at the moment think of a better way.
To populate a UV table you need a server job, to use one in a lookup you need either to use a server job, or a server container in a parallel job. Read Chapter 2 of Parallel Job Developer's Guide for restrictions on using server containers in parallel jobs.

Posted: **Tue Mar 18, 2008 8:50 pm**

What was the solution to this ???

I had a similar problem and we used QualityStage to resolve it.

DSXchange

split string field based on the occurrence of a string

split string field based on the occurrence of a string

'DBA'

name parsing

name parsing

Re: split string field based on the occurrence of a string