import tar.gz files without staging(in memory)

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
nash
Premium Member
Premium Member
Posts: 16
Joined: Thu May 03, 2007 10:26 am
Location: Seattle

import tar.gz files without staging(in memory)

Post by nash »

we need to parse and transform a .tar.gz file in memory. So I used a external source stage and tried to use the following command to tar, gzip and read a specific file inside the tar file.
tar -xzf xyz.tar.gz xx.tsv

Its not importing any records form it. The format is fine as I unzipped manually and then tried to import it. It works. So i used the same format with this command and it should wokr. But it isnt. Am I missing something??? Any suggestions please???? I need a solution asap. Thanks in advance.

More detail:

Need to untar (tar -xzf <tar_file> <specific file in tar>) and read & transform on the fly

1: Using seq file to read.
2 : Using filter option in seq file stge and using the cmd (tar -xzf xyz.tar.gz xx.tsv)
3: not sure what to give in actual filename property

When i execute the job its not aborting but saying 0 rows imported/rejected.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

That command will only gunzip and untar as in land the file, it won't "read" it. You'll need to add the "-O" option to "extract files to standard out" and then the stage should be able to read that stream as if it was reading the file from disk.

Normally, you would put the actual filename to be acted upon by the filter command in the filename box and the stage would combine the two. With the filename buried in the middle like that, you'll need to put the entire command in the Filter and put something innocuous like "/dev/null" as the Filename.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I copied my reply from your 'hijack' post and then deleted my reply from there. With my reply gone, can you please go back and delete yours as well? Thanks.

Edited to add:
Please stop cherry-picking other people's posts to hijack. Stick with this one. Now you've got another extraneous post to delete before someone replies to it. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
nash
Premium Member
Premium Member
Posts: 16
Joined: Thu May 03, 2007 10:26 am
Location: Seattle

Post by nash »

Hey... i did delete my hijack post... I was badly looking for a solution for this problem and I posted my problem wherever it was related...

anyways...thanks for your reply.... but the problem is i dont have the premium membership and so couldnt view your post... can anything be done without me getting the membership???
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You should be able to see enough to get you much futher with this, the rest is gravy. Did you try changing it to untar to standard out? If so, what happened?
-craig

"You can never have too many knives" -- Logan Nine Fingers
nash
Premium Member
Premium Member
Posts: 16
Joined: Thu May 03, 2007 10:26 am
Location: Seattle

Post by nash »

Thank you so much Craig. It worked. I used -Oxzf option and it did the magic. I should have completely read the tar manual. anyways thanks.

I was also wondering does it matter if the filename field in Seq file stage can have any name and still it works? In my case i gave the file name that will be extracted from the archive.

More questions coming... I need to apply my transformations and then write it to a xml file and simultaneously gzip-ing it. (same as read...everything should be done on the fly). So i guess I can use the seq file stage to do the same. I ll try and if i hit into any issues will post the problem.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

No problem, glad you got it working. If you have other issues, please start new threads and don't just dog pile on this one. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
nash
Premium Member
Premium Member
Posts: 16
Joined: Thu May 03, 2007 10:26 am
Location: Seattle

Post by nash »

I already marked this one as resolved... which means I will open a new thread when i have more issues... :wink: Thanks Craig.
Post Reply