Need help to generate schema files dynamically for txt files

JohnFrancisK · Post by **JohnFrancisK** » Sun May 31, 2015 1:07 am

Hi, I have a requirement like there will be set of input .txt files with same set of columns but columns order will be different in each input .txt file. I need to create one reusable job to convert the columns order into one particular column order for all the input .txt files.

For Ex:

I/P:
1st Input .txt file Columns order is A B C D E F G H
2nd Input .txt file Columns Order is A D C H G B F E
3rd Input .txt file Columns Order is B G E H A C F D
....
Nth Input .txt file Columns Order is E H D A C F B G (in some order)

O/P:

1st Input .txt file Columns order is A B C D E F G H
2nd Input .txt file Columns Order is A B C D E F G H
3rd Input .txt file Columns Order is A B C D E F G H
....
Nth Input .txt file Columns Order is A B C D E F G H

Can someone pls help me on this requirement?

Thanks in advance!

chulett · Post by **chulett** » Sun May 31, 2015 4:01 pm

So... I'm wondering how dynamic this all is. How will you know which files have what column layout? It is simply numbered as you've shown (the first is always X, the second always Y, etc. whatever first and second mean at any given time) or is there a naming standard that tells you? Do you need to discover it individually for each file? Something completely different?

Let's start there.

ray.wurlod · Post by **ray.wurlod** » Sun May 31, 2015 4:52 pm

Do the files have column headings? (If yes, that will make the task a lot easier).

ray.wurlod · Post by **ray.wurlod** » Sun May 31, 2015 4:54 pm

Do the files have column headings? (If yes, that will make the task a lot easier).

That said, I don't believe there's going to be an easy "one job" solution. You will almost certainly be up for doing some coding, either a routine or a Build stage.

JohnFrancisK · Post by **JohnFrancisK** » Sun May 31, 2015 6:19 pm

Thanks for your reply...

Yes. All the input files have column headings.

Please let me know how to approach/solution on this......

ray.wurlod · Post by **ray.wurlod** » Sun May 31, 2015 9:51 pm

My solution would use two jobs - one to re-order the columns into the standard format, and the other to process that file.

The first job would read the file as a single VarChar column, including reading the heading row as data (do not check the "first line is column headings check box), and store the column headings and positions from row 0 into stage variables. Its output would use the positional information and the Field() function to direct columns appropriately onto the output.

abhilashnair · Post by **abhilashnair** » Mon Jun 01, 2015 5:01 am

Read the file with a single column and then have a column import stage. In column import select option for schema file and have a parameter for the same.

In this way you will have a single multi instance job but you will have 'n' schema file for n files. When you invoke the job just pass the parameter for the specific schema file. You can re-use the job

ray.wurlod · Post by **ray.wurlod** » Mon Jun 01, 2015 3:16 pm

That solution would involve n! schema files to accommodate every possible schema file. 10! = 3,628,800.

tonystark622 · Post by **tonystark622** » Tue Jun 02, 2015 9:46 am

Another solution to try would be to use the TEXT file ODBC driver and the ODBC stage. As long as you have column headers that are always the same, even if the columns aren't in the same order it should work just by doing "select a,b,c,d,e,f,g,h from text file name". I'm not sure of the exact syntax of the select for text files, but some google research should give you what you need.

I hope this helps,
Tony

chulett · Post by **chulett** » Tue Jun 02, 2015 9:55 am

That's an interesting though... treat them like a database table.

JohnFrancisK · Post by **JohnFrancisK** » Tue Jun 16, 2015 3:33 pm

Hi Abhilash, I tried your solution. If the input file column header order and the schema file column order are same then only its processing the input file and loading the data into output file.

Could you please help me on the details how i can reorder the columns with this approach?

Thanks in advance!

John

JohnFrancisK · Post by **JohnFrancisK** » Tue Jun 16, 2015 3:56 pm

Hi Ray,

I tried your solution. i was able to read the input file as Single Varchar column and able to store the column heading and positions into stage variables.

After that i stuck to proceed further (on how to iterate through all the rows and direct column appropriately onto to the output using Field() function)....

could you please give some more insight on this?

Thanks in advance!

John

ray.wurlod · Post by **ray.wurlod** » Tue Jun 16, 2015 4:37 pm

Basically it's a whole heap of If..Then..Else expressions that test the field number associated with the specific column heading and, where appropriate, use the Field() function to extract that particular substring out of the single VarChar input column.

JohnFrancisK · Post by **JohnFrancisK** » Tue Jun 16, 2015 4:40 pm

Sure Ray. I will try the same.

Thanks,
John :)

chulett · Post by **chulett** » Tue Jun 16, 2015 8:52 pm

John, a couple of things.

Use the Reply to topic link rather than quoting everything from the post you are replying to using Reply with quote unless it makes sense to do so. Also it doesn't make a whole lot of sense to use BBCode tags like for quoting, bold, italics and other things and then check the option to "Disable BBCode in this post"... pretty much defeats the purpose.