Split Data into different files

kashif007 · Post by **kashif007** » Fri Mar 20, 2009 9:28 am

Hi All

I am trying to achieve a new logic to produce flat files. The current design of the job extracts the data from Oracle Database around 150 Million records and then loads into a sequential file for the business to use for reporting. I tried to load the complete 150 Million into the Sequential file but I get the following error.

INVOICE_XRF_ACTL_ORD_Load,0: Output file full

I think sequential file has only 2GB of memroy limit. I want to know if there is any better way to split the file per size and not per rows (transformer stage constraint). I want to creat another Flat file in the output directory as soon as the current output file reaches the memory limit. Example the job runs and produces approx 10 million records in file1 as soon as it reaches memory limit another sequential file2 should be created. Similarly then file3, file4, file5 and so on are created until the complete data is saturated in the job.

Please Advice
Thanks

leomauer · Post by **leomauer** » Fri Mar 20, 2009 12:57 pm

I do not think you can do it in an elegant way. Of course you can try to count bytes written, but even then the DataStage must open the number of output files predefined by design of the job. Unless of course you are ready to use Custom stages.
But what you can do is to define multiple output file names in sequential file stage. Of course all of them will exist, but the datastage will try to equalize there sizes according to partitioning algorithm. It is not what you want, but may be a good solution for this problem.

DSguru2B · Post by **DSguru2B** » Fri Mar 20, 2009 1:45 pm

That limit is set by default by your sys admin. You can get it increased. Check with your unix admin.

kashif007 · Post by **kashif007** » Mon Mar 23, 2009 12:30 pm

The problem was memory limitation. The target folder did not had enough memory for the file to be uploaded. The output file was sized 15G and the folder had just 10G of space. So the mismatch of the folder size and the file size had actually caused an abort. I used to the following command to determine the root cause of the problem.

$> df -h Foldername

Ran the job in a different folder with more memory (70GB) and the job works fine.

Thanks

kashif007 · Post by **kashif007** » Mon Mar 23, 2009 12:33 pm

The problem was memory limitation. The target directory did not had enough memory for the file to be uploaded. The output file was sized 15G and the directory had just 10G of space. So the mismatch of the directory size and the file size had actually caused an abort. I used the following command to determine the root cause of the problem.

$> df -h directoryname

Ran the job in a different directory with more free space (70GB) and the job works fine.

Thanks