Page 1 of 1

Datastage Parallel schema file - indicate # of rows to skip

Posted: Thu Sep 14, 2017 10:37 am
by gaelynalmeida
Hi,

We have a job that reads a flat file, and loads it to a target using RCP. The sequential file stage reads a schema file to determine the layout.

Some files may have headers, some may not. I do not seen an option to set this at run time in the sequential file stage. The value setting for "First line is column names" is a drop down that has a True or False value, and I cannot override with a parameter

Question: Can I specify, somehow, the number of rows to skip? Can this be done in the schema file? I have not found this in any documentation so far.

Thank you
G. Almeida

Posted: Thu Sep 14, 2017 11:28 am
by chulett
While Informatica has a "number of records to skip" for flat files, I don't recall DataStage having anything other than the "First line is column headers" true/false option. Perhaps a question for your official support provider?

Posted: Tue Sep 19, 2017 5:53 am
by Thomas.B
You could use the "Filter" option of the Sequential File stage to skip the first rows from a file, for example :

Code: Select all

sed -e '1,3d'
will skip the first 3 rows.

Posted: Tue Sep 19, 2017 4:21 pm
by gaelynalmeida
Thank you, Craig .. yes, we will reach out to IBM to see what they say. In the mean time, we are using a filter.

Posted: Tue Sep 19, 2017 4:23 pm
by gaelynalmeida
Thank you for your response, Thomas. Yes, we are using an awk filter at this time as a work-around. sed would work just as well.

The trouble is that awk drops some of our records because of non-printable characters, which we would much rather handle further upstream. This is why I was looking for some native functionality.

Perhaps sed will not drop these records - we'll test this out.

Posted: Thu Sep 21, 2017 7:34 am
by chulett
Still applicable in an RCP scenario, though?

Posted: Fri Sep 22, 2017 1:46 am
by Thomas.B
Yes, you just need to disable it in the Sequential File and the Transformer stages, activate it in the Column Generator and the output will represent the schema file.

Posted: Thu Oct 05, 2017 9:28 am
by gaelynalmeida
Thank you for all the excellent answers - we are pretty far gone down our development path, so hard to turn back and add another job to the flow.

For now, I think the filter is our best option.

But the other options are good to know for future reference

Posted: Thu Oct 05, 2017 11:13 pm
by ray.wurlod
I would have thought that it is possible, since the data browser has that feature, as does the Sample stage. Why not create a job with a Sample stage that skips some rows and inspect the generated osh?