Concatenating datasets

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
Karine
Participant
Posts: 37
Joined: Sun Feb 18, 2007 3:33 am

Concatenating datasets

Post by Karine »

I have a requirement to concatenate multiple data sets into a single data set for downstream processing. Can the datasets be 'cat' together in Unix or do have I have to do it datastage? It can be any number of datasets and I would like to concatenate them based on some file pattern. What would be the most appropriate stage to use if it has to be done in datastage?

TIA.
swades
Premium Member
Premium Member
Posts: 323
Joined: Mon Dec 04, 2006 11:52 pm

Post by swades »

Funnel Stage can be appropriate . you can go for Continuous ,Sort or Sequence funnel.
Karine
Participant
Posts: 37
Joined: Sun Feb 18, 2007 3:33 am

Post by Karine »

Thank you for the prompt response.
To use the funnel stage would require me knowing a predetermined number of input datasets beforehand, but I can have any number. My requirement is to concatenate them, whether it's 1 or 1000.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Just specify Append as the Update Policy property value in the Data Set stage.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Karine
Participant
Posts: 37
Joined: Sun Feb 18, 2007 3:33 am

Post by Karine »

Ray,
I don't have the update policy property value in my data set stage. Can you explain to me where it can be found?

May be I'll explain what I'm trying to achieve better: I'm working on this design where I'm getting files from upstream. There could be any number of these files(1,10 or 100 per day) and the file names have the similar file pattern. My quandray is whether I should ask for them in seq files or datasets. If they are sequential files, I can 'cat' them into single file and do continue my processing in datastage. Obviously I prefer to have them in datasets for performance reasons. But I don't know how to concatenate data sets together inside or outside datastage...please help.

Karine
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Yes you do, it's immediately under the File property. And it's a mandatory property, so it will be there. It's only possible values are Overwrite and Append.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ravi468
Participant
Posts: 10
Joined: Sun Feb 04, 2007 9:15 pm

Post by ravi468 »

I had the same situation.

we had a SEQ files which worked fine.i dont know how it works for the datasets.
but the filenames should be similar.

for eg: test1.A
test1.B are the 2 files.

so in the file name of properties tab give `ls test1*`
and select read method as filepattern

this is a bourne shell command which lets you cat the files.
Try with data sets.

so the data from test1.A and data from test1.B is the output of the command.

hope this helps .
Post Reply