Splitting CSV fields with Quotes

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
jzparad
Charter Member
Charter Member
Posts: 151
Joined: Thu Apr 01, 2004 9:37 pm

Splitting CSV fields with Quotes

Post by jzparad »

Is there a nice simple way of splitting a CSV record that contains quoted strings with embedded delimiters?

Example:

Code: Select all

,,,"Pint, US liquid",PT,,,,

Thanks in advance!
Jim Paradies
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes. Get "them" to deliver you a legal CSV file! Nothing could be easier.

If it's an export from Excel, go for tab-delimited.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jzparad
Charter Member
Charter Member
Posts: 151
Joined: Thu Apr 01, 2004 9:37 pm

Post by jzparad »

Thanks Ray.

I was hoping that since a lot of the plug-ins provide a way of defining a quote character that maybe there was some obscure function that offered the same functionality.
Jim Paradies
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's almost an impossible problem to solve without writing a routine. You read the entire row as a single string and process it through the routine that has a "WithinQuotes" flag that tells it whether to ignore the comma inside the quotes, and maybe change the real delimiters to something else, such as a tab.

Code: Select all

FUNCTION FixLine(TheLine, QuoteCharacter)
* Changes a comma-delimited line with embedded commas in quotes to a tab-delimited line

Equate Tab To Char(9)
Equate Comma To ","

String = TheLine
LenString = Len(TheLine)
WithinQuotes = @FALSE

Ans = ""

For Pos = 1 To LenString
   Ch = String[Pos,1]
   If Ch = QuoteCharacter
   Then
      WithinQuotes = Not(WithinQuotes)
   End
   Else
      If Ch = Comma
      Then
         If WithinQuotes
         Then
            Ans := Comma
         End
         Else
            Ans := Tab
         End
      End
      Else
         Ans := Ch
      End
   End
Next Pos

If WithinQuotes
Then
   Call DSTransformError("Mismatched Quotes", "FixLine")
End

RETURN(Ans)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
rleishman
Premium Member
Premium Member
Posts: 252
Joined: Mon Sep 19, 2005 10:28 pm
Location: Melbourne, Australia
Contact:

Post by rleishman »

It's the weekend, so I'm probably on delay. I know I'll regret this in the morning, but isn't

Code: Select all

,,,"Pint, US liquid",PT,,,,
a legal CSV record? The SEQ file stage would handle this easily delimiter set to (,) and Quote Character set to (").

I thought the optional quotes were there to handle embedded delimiters (commas).

Further, embedded quotes are nicely handled by doubling them:

Code: Select all

,,,"Pint, ""US"" liquid",PT,,,,
Ross Leishman
jzparad
Charter Member
Charter Member
Posts: 151
Joined: Thu Apr 01, 2004 9:37 pm

Post by jzparad »

Yes, it would - if you used the SEQ file stage. What I wanted was a function that could be used in BASIC (which I should have stated in the original question).
Jim Paradies
Post Reply