Datastage Parallel schema file - indicate # of rows to skip

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
gaelynalmeida
Premium Member
Premium Member
Posts: 12
Joined: Fri Jul 28, 2017 1:01 pm

Datastage Parallel schema file - indicate # of rows to skip

Post by gaelynalmeida »

Hi,

We have a job that reads a flat file, and loads it to a target using RCP. The sequential file stage reads a schema file to determine the layout.

Some files may have headers, some may not. I do not seen an option to set this at run time in the sequential file stage. The value setting for "First line is column names" is a drop down that has a True or False value, and I cannot override with a parameter

Question: Can I specify, somehow, the number of rows to skip? Can this be done in the schema file? I have not found this in any documentation so far.

Thank you
G. Almeida
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

While Informatica has a "number of records to skip" for flat files, I don't recall DataStage having anything other than the "First line is column headers" true/false option. Perhaps a question for your official support provider?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Thomas.B
Participant
Posts: 63
Joined: Thu Apr 09, 2015 6:40 am
Location: France - Nantes

Post by Thomas.B »

You could use the "Filter" option of the Sequential File stage to skip the first rows from a file, for example :

Code: Select all

sed -e '1,3d'
will skip the first 3 rows.
BI Consultant
DSXConsult
gaelynalmeida
Premium Member
Premium Member
Posts: 12
Joined: Fri Jul 28, 2017 1:01 pm

Post by gaelynalmeida »

Thank you, Craig .. yes, we will reach out to IBM to see what they say. In the mean time, we are using a filter.
gaelynalmeida
Premium Member
Premium Member
Posts: 12
Joined: Fri Jul 28, 2017 1:01 pm

Post by gaelynalmeida »

Thank you for your response, Thomas. Yes, we are using an awk filter at this time as a work-around. sed would work just as well.

The trouble is that awk drops some of our records because of non-printable characters, which we would much rather handle further upstream. This is why I was looking for some native functionality.

Perhaps sed will not drop these records - we'll test this out.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Still applicable in an RCP scenario, though?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Thomas.B
Participant
Posts: 63
Joined: Thu Apr 09, 2015 6:40 am
Location: France - Nantes

Post by Thomas.B »

Yes, you just need to disable it in the Sequential File and the Transformer stages, activate it in the Column Generator and the output will represent the schema file.
BI Consultant
DSXConsult
gaelynalmeida
Premium Member
Premium Member
Posts: 12
Joined: Fri Jul 28, 2017 1:01 pm

Post by gaelynalmeida »

Thank you for all the excellent answers - we are pretty far gone down our development path, so hard to turn back and add another job to the flow.

For now, I think the filter is our best option.

But the other options are good to know for future reference
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I would have thought that it is possible, since the data browser has that feature, as does the Sample stage. Why not create a job with a Sample stage that skips some rows and inspect the generated osh?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply