How to Split the Huge Data?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pkll
Participant
Posts: 73
Joined: Thu Oct 25, 2012 9:45 pm

How to Split the Huge Data?

Post by pkll »

Hi,

I have 4GB Source data,i splitted 2GB,2GB By using like this

my filename is CALL_HISTORY_DETAILS.txt

c:/NRD> du -sh CALL_HIST_DETAILS.txt---->(It contains the file size(4GB))
c:/NRD>wc -l CALL_HIST_DETAILS.txt------->total count is 9868002

C:/NRD>head -4934001 CALL_HIST_DETAILS.txt > test.txt

C:/NRD>tail -4934001 CALL_HIST_DETAILS.txt > test1.txt

When i used tail(test1.txt) command is working fine and it is able to read the data. But,when i used head(test.txt) command it is not working and it is unable to read the data it is showing error.

Let me know why head command(test.txt) is not working? Is this currect process to split the data?
crystal_pup
Participant
Posts: 62
Joined: Thu Feb 08, 2007 6:01 am
Location: Pune

Post by crystal_pup »

Can you paste the exact error that you are getting on using the head command on the test.txt file?
pkll
Participant
Posts: 73
Joined: Thu Oct 25, 2012 9:45 pm

Post by pkll »

Hi crystal,
i am getting below error..

Sequential_File_0,0: Error reading on import.
Sequential_File_0,0: Consumed more than 100,000 bytes looking for record delimiter; aborting
Sequential_File_0,0: Import error at record 0.
Sequential_File_0,0: The runLocally() of the operator failed.

But,Tail command is working fine...

is there any another alternative process for split the data?
anbu
Premium Member
Premium Member
Posts: 596
Joined: Sat Feb 18, 2006 2:25 am
Location: india

Use Split command to split the file

Post by anbu »

Code: Select all

split -4934001 CALL_HIST_DETAILS.txt test
You are the creator of your destiny - Swami Vivekananda
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi,
You can try with sed command

Code: Select all

   END=`wc -l CALL_HIST_DETAILS.txt|awk -F" " '{print $1}'`
   sed -n '1,4934001p' CALL_HIST_DETAILS.txt > test.txt
   sed -n '4934002,'$END'p' CALL_HIST_DETAILS.txt > test1.txt
anbu
Premium Member
Premium Member
Posts: 596
Joined: Sat Feb 18, 2006 2:25 am
Location: india

Post by anbu »

Few changes to prasson_ibm's code.

Code: Select all

END=`wc -l < CALL_HIST_DETAILS.txt` 
   sed -n '1,4934001{p;4934001q;}' CALL_HIST_DETAILS.txt > test.txt 
   sed -n '4934002,$p' CALL_HIST_DETAILS.txt > test1.txt
You are the creator of your destiny - Swami Vivekananda
daignault
Premium Member
Premium Member
Posts: 165
Joined: Tue Mar 30, 2004 2:44 pm
Contact:

Post by daignault »

Try defining the sequential file correctly. You have not defined a correct delimiter for parsing the columns or you have not defined the blocks of data. Fix your definition and the reading of the data by Datastage will be correct.

Regards

Ray D
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So this is your unzipped file?
-craig

"You can never have too many knives" -- Logan Nine Fingers
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

i guess. see am guessing, i guess. ok
Sequential_File_0,0: Consumed more than 100,000 bytes looking for record delimiter; aborting
Sequential_File_0,0: Import error at record 0
What is the first record in your head file. Does it have column header.
Do it have more than 100,000 character. And what is the record delimiter that you have specified in the format of the sequential file stage. And do you have new line character in the first record.

Remove the first line and run
Thanks,
Prasanna
prasannakumarkk
Participant
Posts: 117
Joined: Wed Feb 06, 2013 9:24 am
Location: Chennai,TN, India

Post by prasannakumarkk »

Also in the director, for the run, check the monitor. How many records are read from the sequential file stage. It should be zero. Correct?
Thanks,
Prasanna
Post Reply