Need help to generate schema files dynamically for txt files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

JohnFrancisK
Premium Member
Premium Member
Posts: 6
Joined: Sun May 31, 2015 12:04 am

Need help to generate schema files dynamically for txt files

Post by JohnFrancisK »

Hi, I have a requirement like there will be set of input .txt files with same set of columns but columns order will be different in each input .txt file. I need to create one reusable job to convert the columns order into one particular column order for all the input .txt files.


For Ex:

I/P:
1st Input .txt file Columns order is A B C D E F G H
2nd Input .txt file Columns Order is A D C H G B F E
3rd Input .txt file Columns Order is B G E H A C F D
....
Nth Input .txt file Columns Order is E H D A C F B G (in some order)

O/P:


1st Input .txt file Columns order is A B C D E F G H
2nd Input .txt file Columns Order is A B C D E F G H
3rd Input .txt file Columns Order is A B C D E F G H
....
Nth Input .txt file Columns Order is A B C D E F G H

Can someone pls help me on this requirement?

Thanks in advance!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So... I'm wondering how dynamic this all is. How will you know which files have what column layout? It is simply numbered as you've shown (the first is always X, the second always Y, etc. whatever first and second mean at any given time) or is there a naming standard that tells you? Do you need to discover it individually for each file? Something completely different? :?

Let's start there.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Do the files have column headings? (If yes, that will make the task a lot easier).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Do the files have column headings? (If yes, that will make the task a lot easier).

That said, I don't believe there's going to be an easy "one job" solution. You will almost certainly be up for doing some coding, either a routine or a Build stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
JohnFrancisK
Premium Member
Premium Member
Posts: 6
Joined: Sun May 31, 2015 12:04 am

Post by JohnFrancisK »

Thanks for your reply...

Yes. All the input files have column headings.


Please let me know how to approach/solution on this......
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

My solution would use two jobs - one to re-order the columns into the standard format, and the other to process that file.

The first job would read the file as a single VarChar column, including reading the heading row as data (do not check the "first line is column headings check box), and store the column headings and positions from row 0 into stage variables. Its output would use the positional information and the Field() function to direct columns appropriately onto the output.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
abhilashnair
Participant
Posts: 284
Joined: Fri Oct 13, 2006 4:31 am

Post by abhilashnair »

Read the file with a single column and then have a column import stage. In column import select option for schema file and have a parameter for the same.

In this way you will have a single multi instance job but you will have 'n' schema file for n files. When you invoke the job just pass the parameter for the specific schema file. You can re-use the job
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

That solution would involve n! schema files to accommodate every possible schema file. 10! = 3,628,800.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

Another solution to try would be to use the TEXT file ODBC driver and the ODBC stage. As long as you have column headers that are always the same, even if the columns aren't in the same order it should work just by doing "select a,b,c,d,e,f,g,h from text file name". I'm not sure of the exact syntax of the select for text files, but some google research should give you what you need.

I hope this helps,
Tony
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That's an interesting though... treat them like a database table. 8)
-craig

"You can never have too many knives" -- Logan Nine Fingers
JohnFrancisK
Premium Member
Premium Member
Posts: 6
Joined: Sun May 31, 2015 12:04 am

Post by JohnFrancisK »

Hi Abhilash, I tried your solution. If the input file column header order and the schema file column order are same then only its processing the input file and loading the data into output file.

Could you please help me on the details how i can reorder the columns with this approach?

Thanks in advance!

John
JohnFrancisK
Premium Member
Premium Member
Posts: 6
Joined: Sun May 31, 2015 12:04 am

Post by JohnFrancisK »

Hi Ray,

I tried your solution. i was able to read the input file as Single Varchar column and able to store the column heading and positions into stage variables.

After that i stuck to proceed further (on how to iterate through all the rows and direct column appropriately onto to the output using Field() function)....

could you please give some more insight on this?

Thanks in advance!

John
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Basically it's a whole heap of If..Then..Else expressions that test the field number associated with the specific column heading and, where appropriate, use the Field() function to extract that particular substring out of the single VarChar input column.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
JohnFrancisK
Premium Member
Premium Member
Posts: 6
Joined: Sun May 31, 2015 12:04 am

Post by JohnFrancisK »

Sure Ray. I will try the same.

Thanks,
John :)
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

John, a couple of things.

Use the Reply to topic link rather than quoting everything from the post you are replying to using Reply with quote unless it makes sense to do so. Also it doesn't make a whole lot of sense to use BBCode tags like for quoting, bold, italics and other things and then check the option to "Disable BBCode in this post"... pretty much defeats the purpose. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply