Split file to respect limit of records

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
inadeau
Participant
Posts: 11
Joined: Mon Jul 19, 2004 7:37 am

Split file to respect limit of records

Post by inadeau »

Hi,

Is there a way with PX to split a file in as many files as needed considering that there each file has a limit of maximum 150 000 rows ?

Thanks,

Isabelle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Row-based rules are very difficult to implement, particularly in a parallel environment. If you're not concerned about which row goes into which file, then it's probably easier to do.

It may be easier to use UNIX utilities after PX has produced one file per node to achieve what you require.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bcarlson
Premium Member
Premium Member
Posts: 772
Joined: Fri Oct 01, 2004 3:06 pm
Location: Minnesota

Post by bcarlson »

As Ray said, you can use the Unix split command on the output. That is probably your best bet, since you can specify 150,000 records per file and not have to know anything about the input file itself.

If you need to it in DX, you can try exporting to a file-set. While you cannot specify the number of records, you can specify a maximum filesize. If your output happens to be fixed-length (which would be very convenient), you can calculate the max size of the file (1 MB or greater). Not real dynamic, but may be worth a try if Unix utilities are not available.

Another option is to add a field that contains a file number, then use the switch stage to create a new stream per file number, and export from each. Not very dynamic, but it works. If you know how many records you start with, you can calculate how many target files to create. Use the Generator Stage and add a numeric field, cycling between 1 and the total number of files.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Even though you're on Windows, Isabelle, there's no reason not to use UNIX utilities. By installing software such as MKS Toolkit you get robust, tested, Windows-based UNIX utilities. (There are also free and shareware versions out there; a search on Google should find them for you.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply