Sequential file outputs Twice the number of rows it has !

ady · Post by **ady** » Thu Mar 29, 2007 12:55 pm

Hi,

I have a server job which writes 1249149 rows to a sequential file.
When the same file is used as an input in another job it outputs 2410220

I dont understand why this is happening, I tried this file out in a parallel job, In that case also it gives out 2410220 rows.

Please help

meena · Post by **meena** » Thu Mar 29, 2007 1:25 pm

Hi,
What exactly are you doing in the job. After loading the data into sequential file are you able to view the data. And also check with the "Update action".

DSguru2B · Post by **DSguru2B** » Thu Mar 29, 2007 1:39 pm

Are you appending to the file?

ady · Post by **ady** » Thu Mar 29, 2007 2:26 pm

I am overwriting the file ..

I am able to view the data properly. My job design is

Seq file > Transformer > Seqfile

The transformer has a constraint which does not allow blank rows to pass. Thats it !

DSguru2B · Post by **DSguru2B** » Thu Mar 29, 2007 2:53 pm

Something is off. What is your source. Regardless, check the row counts of your source and then do a comparison with target. Something fishy going on with what you see on the designer as rows.
You sure no other process going on? Like an after job subroutine that doubles the file or soemthing. Something happening inside your job in stage variables or in a routine where you are duplicating/doubling records?

ady · Post by **ady** » Thu Mar 29, 2007 3:04 pm

The data comes from a script, The script gives out the record count in the last row of the data which is rejected, I actually extract the record count and write it in different sequential file.

The job actually has a after job subroutine !!

The process compares the "count from the script" and the "count of rows processed on the output link" and if they are the same moves the output file to different location or deletes the file.

ray.wurlod · Post by **ray.wurlod** » Thu Mar 29, 2007 3:36 pm

Is there a before/after stage/job subroutine that copies the file into place before the job moves a second set of records to it.

What are the link row counts reported? You chose not to reveal this vital piece of information.

In the parallel job it might occur, for example, if you used Entire partitioning on two nodes.

In a server job, something external is happening. Are you overwriting or appending?

Have you checked that your script is returning the correct row count? What does wc -l filename report?

Are there any newline characters in your data? If so, have you handled that situation in the column metadata?

ady · Post by **ady** » Thu Mar 29, 2007 4:05 pm

The routine moves the output file when the job is a success to a different location. I have checked if the script is appending to the previous version of the file, but its not. Its moving the file and replacing the old one as its supposed to do.

There are no new line characters in the file. I have checked and the script gets the correct row count.

As you said something external is happening ! .... Hmm..