Need help with reading delimited text file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Need help with reading delimited text file

Post by mavrick21 »

Hello,

I have source text files (~ 50 of them) that are delimited by ||@@##

I tried this approach and it didn't work.

1) Replaced ||@@## with Hex value 7 using
sed 's/||@@##/\x7/g' source.txt > source.out

2) Put &H7 in Other Delimiter field in Import Metadata for sequential file.

3) Clicked on Preview but all fields showed up as just one field.

Please advise.

Thanks
-Mav
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Try "007" instead.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Or &H07
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

Tried both and they didn't work. Also tried few other hex (non-printable) values. In Import Meta Data(Sequential) I also tried toggling the NLS map values - None, UTF 8, ASCII and ISO-8859-1. Maybe I'm doing something wrong.

Here's a sample file (src.txt) for which I'm trying to import the metadata.

Code: Select all

cat > src.txt
Col1||@@##Col2||@@##Col3
Val1||@@##Val2||@@##Val3


$od -cx src.txt
0000000   C   o   l   1   |   |   @   @   #   #   C   o   l   2   |   |
           6f43    316c    7c7c    4040    2323    6f43    326c    7c7c
0000020   @   @   #   #   C   o   l   3  \n   V   a   l   1   |   |   @
           4040    2323    6f43    336c    560a    6c61    7c31    407c
0000040   @   #   #   V   a   l   2   |   |   @   @   #   #   V   a   l
           2340    5623    6c61    7c32    407c    2340    5623    6c61
0000060   3  \n
           0a33
0000062


$ sed 's/||@@##/\x7/g' src.txt   > tgt1.txt
$ sed 's/||@@##/\x07/g' src.txt  > tgt2.txt
$ sed 's/||@@##/\o7/g' src.txt  > tgt3.txt

$cat tgt1.txt
Col1Col2Col3
Val1Val2Val3

$od -cx tgt1.txt
0000000   C   o   l   1  \a   C   o   l   2  \a   C   o   l   3  \n   V
           6f43    316c    4307    6c6f    0732    6f43    336c    560a
0000020   a   l   1  \a   V   a   l   2  \a   V   a   l   3  \n
           6c61    0731    6156    326c    5607    6c61    0a33
0000036

$cat tgt2.txt
Col1Col2Col3
Val1Val2Val3

$od -cx tgt2.txt
0000000   C   o   l   1  \a   C   o   l   2  \a   C   o   l   3  \n   V
           6f43    316c    4307    6c6f    0732    6f43    336c    560a
0000020   a   l   1  \a   V   a   l   2  \a   V   a   l   3  \n
           6c61    0731    6156    326c    5607    6c61    0a33
0000036

$cat tgt3.txt
Col1Col2Col3
Val1Val2Val3

$od -cx tgt3.txt
0000000   C   o   l   1  \a   C   o   l   2  \a   C   o   l   3  \n   V
           6f43    316c    4307    6c6f    0732    6f43    336c    560a
0000020   a   l   1  \a   V   a   l   2  \a   V   a   l   3  \n
           6c61    0731    6156    326c    5607    6c61    0a33
0000036

Is there any other approach I can try other than mine?

Thanks
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

It's interesting that your target files from the sed commands have \a as their delimiter, not \007.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

Using 'a' instead of 'c' in od command

Code: Select all

od -ax tgt3.txt
0000000   C   o   l   1 bel   C   o   l   2 bel   C   o   l   3  nl   V
           6f43    316c    4307    6c6f    0732    6f43    336c    560a
0000020   a   l   1 bel   V   a   l   2 bel   V   a   l   3  nl
           6c61    0731    6156    326c    5607    6c61    0a33
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Ah. of course. BEL = "alert". Missed that nuance. So, back to the original question - when importing the metadata from the (target) sequential file, specify the field delimiter character as 007. It must have three digits when in decimal format.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

When I type in &H07 ( or &H7 or &H007) and click on preview, &H07 ( or &H7 or &H007) automatically changes to 007.

When I type in 007 and click on preview it stays 007.

Still doesn't work.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I just tried it here, and 007 does work when the delimiter is BEL. This is during Import > Table Definition > Sequential File ?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mavrick21
Premium Member
Premium Member
Posts: 335
Joined: Sun Apr 23, 2006 11:25 pm

Post by mavrick21 »

Still doesn't work for me.

http://imgur.com/bDhY7TQ

I click on preview and everything shows up as just one field. I thought maybe Preview is buggy so I clicked on the next tab Define and still no success.

I'm working on DS 8.5 Server edition installed on RHEL ver 6.4 (64-bit).

Code: Select all

Here are the NLS settings on the RHEL box:

$ echo $NLS_LANG
American_America.WE8ISO8859P1

$ locale
LANG=en_US.UTF-8
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
Any other suggestions?
Post Reply