Page 1 of 1

import tar.gz files without staging(in memory)

Posted: Wed Jun 24, 2009 6:54 pm
by nash
we need to parse and transform a .tar.gz file in memory. So I used a external source stage and tried to use the following command to tar, gzip and read a specific file inside the tar file.
tar -xzf xyz.tar.gz xx.tsv

Its not importing any records form it. The format is fine as I unzipped manually and then tried to import it. It works. So i used the same format with this command and it should wokr. But it isnt. Am I missing something??? Any suggestions please???? I need a solution asap. Thanks in advance.

More detail:

Need to untar (tar -xzf <tar_file> <specific file in tar>) and read & transform on the fly

1: Using seq file to read.
2 : Using filter option in seq file stge and using the cmd (tar -xzf xyz.tar.gz xx.tsv)
3: not sure what to give in actual filename property

When i execute the job its not aborting but saying 0 rows imported/rejected.

Posted: Wed Jun 24, 2009 10:01 pm
by chulett
That command will only gunzip and untar as in land the file, it won't "read" it. You'll need to add the "-O" option to "extract files to standard out" and then the stage should be able to read that stream as if it was reading the file from disk.

Normally, you would put the actual filename to be acted upon by the filter command in the filename box and the stage would combine the two. With the filename buried in the middle like that, you'll need to put the entire command in the Filter and put something innocuous like "/dev/null" as the Filename.

Posted: Wed Jun 24, 2009 10:02 pm
by chulett
I copied my reply from your 'hijack' post and then deleted my reply from there. With my reply gone, can you please go back and delete yours as well? Thanks.

Edited to add:
Please stop cherry-picking other people's posts to hijack. Stick with this one. Now you've got another extraneous post to delete before someone replies to it. :?

Posted: Thu Jun 25, 2009 9:16 am
by nash
Hey... i did delete my hijack post... I was badly looking for a solution for this problem and I posted my problem wherever it was related...

anyways...thanks for your reply.... but the problem is i dont have the premium membership and so couldnt view your post... can anything be done without me getting the membership???

Posted: Thu Jun 25, 2009 10:01 am
by chulett
You should be able to see enough to get you much futher with this, the rest is gravy. Did you try changing it to untar to standard out? If so, what happened?

Posted: Thu Jun 25, 2009 10:17 am
by nash
Thank you so much Craig. It worked. I used -Oxzf option and it did the magic. I should have completely read the tar manual. anyways thanks.

I was also wondering does it matter if the filename field in Seq file stage can have any name and still it works? In my case i gave the file name that will be extracted from the archive.

More questions coming... I need to apply my transformations and then write it to a xml file and simultaneously gzip-ing it. (same as read...everything should be done on the fly). So i guess I can use the seq file stage to do the same. I ll try and if i hit into any issues will post the problem.

Posted: Thu Jun 25, 2009 10:30 am
by chulett
No problem, glad you got it working. If you have other issues, please start new threads and don't just dog pile on this one. :wink:

Posted: Thu Jun 25, 2009 10:44 am
by nash
I already marked this one as resolved... which means I will open a new thread when i have more issues... :wink: Thanks Craig.