Can one remove an end-of-wave from a data stream?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Can one remove an end-of-wave from a data stream?

Post by ArndW »

I have a job consisting of many consecutive stages which processes a very large number of records coming from a database (hundreds of millions of rows). The data is coming into the job correctly sorted and partitioned and the pipeline is streamlined with no stages that would break the pipeline.

I need to add some processing (a lookup, with rejected records that get re-inserted into the data stream using a sorted funnel stage) but if one of the links to the sorted funnel stage has no records the pipeline gets broken as all records get buffered at this point.

I can work around this problem by inserting a end-of-wave marker every n-records to allow processing to proceed in "chunks".

Unfortunately, towards the end of processing I have a transform stage which uses stage-variables to store cumulative values over many records; and this stage gets reset when it sees an end-of-wave marker. Also, in some copies of this job there is also a re-sort; where an end-of-wave marker would result in incorrect sorting. Both of these are unwanted effects in end-of-wave processing and lead to incorrect results.

What I am looking for is a way to remove the end-of-wave marker values prior to reaching these parts of the job. While there's a end-of-wave stage that allows us to insert markers into the data, there's none out there to allow one to remove those markers.

Before throwing this over the fence at IBM support I thought I might ask around here to see if someone might have a solution.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I for one have no idea but will be following this with bated breath. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
JRodriguez
Premium Member
Premium Member
Posts: 425
Joined: Sat Nov 19, 2005 9:26 am
Location: New York City
Contact:

Post by JRodriguez »

Hi ArnD,

I believe that is not possible to remove the end-of-wave marker once inserted in the stream, but in your scenario you could use LastRow() function to detect the end-of-wave signal in the transformer and then act on it

Regards
Julio Rodriguez
ETL Developer by choice

"Sure we have lots of reasons for being rude - But no excuses
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

JDRodriguez - thanks for the hint about the LastRow(); but unfortunately the multiple-row handling necessary in that stage can require records that would have been grouped into another wave. I think I'll open up a call at IBM to see if the powers that be might have a solution.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I've submitted the question to IBM and will see what they have to say.

I did do some tests with writing to a named pipe and then reading from the same named in the parallel job and that works to eliminate the End-Of-Wave markers, but for other reasons (this happens in a shared-container which is called in parallel and the fifo file names aren't known ahead of time and thus cannot be created with "mkfifo" beforehand) we can't implement this workaround so I'm hoping that IBM might have a suggestion.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

IBM has responded that there is no way to remove end-of-wave markers at present. I've submitted an RFE to add a corresponding counterpart to the Wave-Generator stage, but if they opt to include such a stage in the future it will most likely take quite a while before it is added to the product.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

There now a "Feature request" for this -

https://www.ibm.com/developerworks/rfe/ ... R_ID=65319
Post Reply