Split Data into different files

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kashif007
Premium Member
Premium Member
Posts: 216
Joined: Wed Jun 07, 2006 5:48 pm
Location: teaneck

Split Data into different files

Post by kashif007 »

Hi All

I am trying to achieve a new logic to produce flat files. The current design of the job extracts the data from Oracle Database around 150 Million records and then loads into a sequential file for the business to use for reporting. I tried to load the complete 150 Million into the Sequential file but I get the following error.

INVOICE_XRF_ACTL_ORD_Load,0: Output file full

I think sequential file has only 2GB of memroy limit. I want to know if there is any better way to split the file per size and not per rows (transformer stage constraint). I want to creat another Flat file in the output directory as soon as the current output file reaches the memory limit. Example the job runs and produces approx 10 million records in file1 as soon as it reaches memory limit another sequential file2 should be created. Similarly then file3, file4, file5 and so on are created until the complete data is saturated in the job.

Please Advice
Thanks
Regards
Kashif Khan
leomauer
Premium Member
Premium Member
Posts: 100
Joined: Mon Nov 03, 2003 1:33 pm

Post by leomauer »

I do not think you can do it in an elegant way. Of course you can try to count bytes written, but even then the DataStage must open the number of output files predefined by design of the job. Unless of course you are ready to use Custom stages.
But what you can do is to define multiple output file names in sequential file stage. Of course all of them will exist, but the datastage will try to equalize there sizes according to partitioning algorithm. It is not what you want, but may be a good solution for this problem.
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

That limit is set by default by your sys admin. You can get it increased. Check with your unix admin.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
kashif007
Premium Member
Premium Member
Posts: 216
Joined: Wed Jun 07, 2006 5:48 pm
Location: teaneck

Post by kashif007 »

The problem was memory limitation. The target folder did not had enough memory for the file to be uploaded. The output file was sized 15G and the folder had just 10G of space. So the mismatch of the folder size and the file size had actually caused an abort. I used to the following command to determine the root cause of the problem.

$> df -h Foldername

Ran the job in a different folder with more memory (70GB) and the job works fine.

Thanks
Regards
Kashif Khan
kashif007
Premium Member
Premium Member
Posts: 216
Joined: Wed Jun 07, 2006 5:48 pm
Location: teaneck

Post by kashif007 »

The problem was memory limitation. The target directory did not had enough memory for the file to be uploaded. The output file was sized 15G and the directory had just 10G of space. So the mismatch of the directory size and the file size had actually caused an abort. I used the following command to determine the root cause of the problem.

$> df -h directoryname

Ran the job in a different directory with more free space (70GB) and the job works fine.

Thanks
Regards
Kashif Khan
Post Reply