Ignore columns from a sequential file

rmcclure · Post by **rmcclure** » Tue Jul 21, 2015 1:39 pm

Hi,

I am trying to read records from a sequential file to process and write to a table.

The problem is the sequential file is a global company file that generated by a group to be used by multiple divisions. I only need the first 70 columns of about 90. When I try to run the job I get a warning that:
"Import consumed only 1968bytes of the record's 2073 bytes (no further warnings will be generated from this partition)"
I could add 20 extra columns and make this warning go away but the number of columns in this file could change if another division requests additional columns.

How can I read a sequential file and just take the first 70 columns without getting warnings.

The file is:
Delimiter = comma
Null field value = ''
Quote = double

chulett · Post by **chulett** » Tue Jul 21, 2015 1:57 pm

Worst case you can use a Server Sequential File stage in a Server Shared Container as it has an "Ignore Truncation" option from what I recall. Unless there is something similar in the Parallel version somewhere?

rkashyap · Post by **rkashyap** » Tue Jul 21, 2015 2:31 pm

Another option could be to use External Source stage with Source Program

Code: Select all

cut -d',' -f1-70 <INFILE>

chulett · Post by **chulett** » Tue Jul 21, 2015 2:50 pm

... or that in the Filter option of the Sequential File stage.

ray.wurlod · Post by **ray.wurlod** » Tue Jul 21, 2015 3:30 pm

... or the Drop on Import property for columns in the Sequential File stage itself.

chulett · Post by **chulett** » Tue Jul 21, 2015 3:50 pm

But wouldn't you have to define them so it knows what to drop? Or does it drop any not defined?

ray.wurlod · Post by **ray.wurlod** » Tue Jul 21, 2015 6:21 pm

chulett wrote:But wouldn't you have to define them so it knows what to drop? Or does it drop any not defined?

Yes, you do have to define them.

chulett · Post by **chulett** » Tue Jul 21, 2015 6:28 pm

Ah... which they don't want to do.

ray.wurlod · Post by **ray.wurlod** » Tue Jul 21, 2015 9:52 pm

The columns could be generically named for the purposes of this exercise.

rmcclure · Post by **rmcclure** » Mon Jul 27, 2015 1:28 pm

The cut command did not work because the sequential file has descriptions in quotes and some of those description have commas. Others do not.

I ended up adding the generic columns. If they add more columns in the future I will need to modify my job.

chulett · Post by **chulett** » Mon Jul 27, 2015 1:33 pm

You didn't try the Server stage? It would not require any modifications when / if the file changes.

rkashyap · Post by **rkashyap** » Mon Jul 27, 2015 9:12 pm

Another option ... Use a Unix command to replace the "delimiter commas" with another delimiter that would not occur in data (while ignoring commas within quotes) and subsequently using the replacee delimiter to extract first seventy columns.

See example of using nawk (on Solaris) to replace delimiting commas with pipe(|) and extracting first 70 columns below:

Code: Select all

nawk -F\" 'BEGIN{OFS=FS;} {for(i=1;i<=NF;i=i+2){gsub(/,/,"|",$i);} print $0;}' <infile>| awk -F"|" '{ for(i=1; i<=70; i++) printf("%s|"), $i ;printf("\n") };'

It is possible to merge the nawk and awk commands given above.

I believe that a much simpler/elegant solution can be implemented using Perl.

chulett · Post by **chulett** » Mon Jul 27, 2015 11:15 pm

And I believe that a much simpler/elegant solution can be implemented using the Server version of the Sequential File stage.

Suppress row truncation warnings. If the sequential file being read contains more columns that you have defined, you will normally receive warnings about overlong rows when the job is run. If you want to suppress these message (for example, you might only be interested in the first three columns and happy to ignore the rest), select this check box.

rkashyap · Post by **rkashyap** » Tue Jul 28, 2015 6:14 am

I agree. Server job would also be more platform independent and may align better with existing skillset of the shop than awk/Perl ... thus have a lower lifetime cost

Though if OP wants to use Parallel job then Unix/Perl offer a viable solution.

chulett · Post by **chulett** » Tue Jul 28, 2015 6:36 am

Didn't mean a Server job in this particular case but as mentioned earlier a Server Sequential File stage in a Server Shared Container in their otherwise Parallel job.