How to Split the Huge Data?

pkll · Post by **pkll** » Wed Mar 20, 2013 4:23 am

Hi,

I have 4GB Source data,i splitted 2GB,2GB By using like this

my filename is CALL_HISTORY_DETAILS.txt

c:/NRD> du -sh CALL_HIST_DETAILS.txt---->(It contains the file size(4GB))
c:/NRD>wc -l CALL_HIST_DETAILS.txt------->total count is 9868002

C:/NRD>head -4934001 CALL_HIST_DETAILS.txt > test.txt

C:/NRD>tail -4934001 CALL_HIST_DETAILS.txt > test1.txt

When i used tail(test1.txt) command is working fine and it is able to read the data. But,when i used head(test.txt) command it is not working and it is unable to read the data it is showing error.

Let me know why head command(test.txt) is not working? Is this currect process to split the data?

crystal_pup · Post by **crystal_pup** » Wed Mar 20, 2013 4:44 am

Can you paste the exact error that you are getting on using the head command on the test.txt file?

pkll · Post by **pkll** » Wed Mar 20, 2013 5:20 am

Hi crystal,
i am getting below error..

Sequential_File_0,0: Error reading on import.
Sequential_File_0,0: Consumed more than 100,000 bytes looking for record delimiter; aborting
Sequential_File_0,0: Import error at record 0.
Sequential_File_0,0: The runLocally() of the operator failed.

But,Tail command is working fine...

is there any another alternative process for split the data?

anbu · Post by **anbu** » Wed Mar 20, 2013 5:38 am

Code: Select all

split -4934001 CALL_HIST_DETAILS.txt test

prasson_ibm · Post by **prasson_ibm** » Wed Mar 20, 2013 5:54 am

Hi,
You can try with sed command

Code: Select all

   END=`wc -l CALL_HIST_DETAILS.txt|awk -F" " '{print $1}'`
   sed -n '1,4934001p' CALL_HIST_DETAILS.txt > test.txt
   sed -n '4934002,'$END'p' CALL_HIST_DETAILS.txt > test1.txt

anbu · Post by **anbu** » Wed Mar 20, 2013 6:23 am

Few changes to prasson_ibm's code.

Code: Select all

END=`wc -l < CALL_HIST_DETAILS.txt` 
   sed -n '1,4934001{p;4934001q;}' CALL_HIST_DETAILS.txt > test.txt 
   sed -n '4934002,$p' CALL_HIST_DETAILS.txt > test1.txt

Post by **daignault** » Wed Mar 20, 2013 6:36 am

Try defining the sequential file correctly. You have not defined a correct delimiter for parsing the columns or you have not defined the blocks of data. Fix your definition and the reading of the data by Datastage will be correct.

Regards

Ray D

chulett · Post by **chulett** » Wed Mar 20, 2013 7:33 am

So this is your unzipped file?

prasannakumarkk · Post by **prasannakumarkk** » Wed Mar 20, 2013 9:38 am

i guess. see am guessing, i guess. ok

Sequential_File_0,0: Consumed more than 100,000 bytes looking for record delimiter; aborting
Sequential_File_0,0: Import error at record 0

What is the first record in your head file. Does it have column header.
Do it have more than 100,000 character. And what is the record delimiter that you have specified in the format of the sequential file stage. And do you have new line character in the first record.

Remove the first line and run

prasannakumarkk · Post by **prasannakumarkk** » Wed Mar 20, 2013 9:40 am

Also in the director, for the run, check the monitor. How many records are read from the sequential file stage. It should be zero. Correct?

DSXchange

How to Split the Huge Data?

How to Split the Huge Data?

Use Split command to split the file