Page 1 of 1

How to read the space delimited file??

Posted: Mon Apr 17, 2017 11:34 pm
by SURA
Hi there


I need some suggestions to read this file using datastage. This is a log file data which need to be loaded into a table.

These all are the columns i am after:

DATE
TIME
Event
Username
Application
Created_By
IP
Details
ClientId
Log_Type

see the below sample records from two different files, though it contains same event type. I thought to use the space delimiter, but it is not working. I am thinking to use to string match and then find a way to load the data by using field function.

Code: Select all

2017-04-03 12:56:43  [http-bio-aaaa-exec-10] INFO a.b.c.d.AppUserManagementController -  Event=User_Created Username=ABCDl8 Application=XYZ Created_By=Admin IP=0:0:0:0:0:0:0:1 Details="Roles_Added=ZZZZZZZZ Office_Added=OrganizationProfile{orgId:12323112, orgName:ASDASDDS}" ClientId=useradmin Log_Type=audit

2017-04-05 11:37:29  [WebContainer : 0] INFO a.b.c.d.AppUserManagementController -  Event=User_Created Username=12345 Application=ABCD Created_By=Admin IP=0:0:0:0:0:0:0:1 Details="Roles_Added=SDFDS SDS SS Office_Added=OrganizationProfile{orgId:45455465, orgName:ASDASFDFDSF}" ClientId=SDFSFD Log_Type=audit
Is this is the best way to load the data \ any other better way?

Please throw some light.

Notes: In the same file, i may end up getting 33 different event. In such case i may need to find 33 different ways to fetch the relevant data.

Posted: Tue Apr 18, 2017 12:45 am
by ray.wurlod
Have you tried using a Sequential File stage with space (or 0x20) specified as the field delimiter character? If so, what was the result?

Re: How to read the space delimited file??

Posted: Tue Apr 18, 2017 6:50 am
by chulett
SURA wrote:I am thinking to use to string match and then find a way to load the data by using field function.
Or that string match followed by a substring. Once you find the position of an identifying string, like "Event=", seems to me you could find the next space after that position and then pull out the event name. Lather, rinse, repeat.

Posted: Tue Apr 18, 2017 7:43 am
by FranklinE
Your samples indicate bad design for the extract source. There's no two ways about that.

I often harp on such things. I "grew up" in a Cobol environment, where delimiters are avoided and fixed-length fields make reading and writing an exact science. For this, I would point to how some columns are tagged and others are not. This is worse than inconsistent. It's lazy.

Unless you have variable length columns on your data, I suggest fixed-length column definitions. It's the only way to be sure. The ideal "solution" is to push this back on whoever is responsible for the extract and make them create a format that is consistent. You really shouldn't have to customize your read at the column level.

Posted: Tue Apr 18, 2017 9:34 am
by chulett
I've seen this before mining Apache logs and as noted, being a log file I doubt there's any wiggle room here for improvement. Can't hurt to ask, though. :wink:

There used to be a Perl-based module in DataStage specifically for handling data of this nature. That might be another option to consider - Perl.

Posted: Tue Apr 18, 2017 5:47 pm
by SURA
ray.wurlod wrote:Have you tried using a Sequential File stage with space (or 0x20) specified as the field delimiter character? If so, what was the result? ...
Ray

Yes i tried that option . The issue I have here is, a single file which contains 33 different events. Each events have different number of columns.

Re: How to read the space delimited file??

Posted: Tue Apr 18, 2017 5:48 pm
by SURA
chulett wrote:
SURA wrote:I am thinking to use to string match and then find a way to load the data by using field function.
Or that string match followed by a substring. Once you find the position of an ...
Yes that exactly i did after i post this query.

Posted: Tue Apr 18, 2017 5:57 pm
by SURA
FranklinE wrote:The ideal "solution" is to push this back on whoever is responsible for .
You are 100% right.

The issue are...

1. When they gave the sample file, it wasn't that bad. Though the work is messy , still it was manageable.

2. When the scope was extended, then few new files came, then we started to get the trouble. (At the time of writing this tread, i saw further changes!!!)

3. Not involved at the right time.

So at this stage, it like catching a tigers tail!!

As i pointed earlier , i used grep the details based on the STRING and finishing this task.

To me personally i wont prefer this way, but .....

Posted: Tue Apr 18, 2017 7:12 pm
by chulett
I don't see any point where you mentioned grep or that you "finished the task"... should we mark this as a "workaround"?

Posted: Tue Apr 18, 2017 9:15 pm
by SURA
Yes , i am marking this thread as Workaround.

Thanks to all for your time and help.