I have a job consisting of many consecutive stages which processes a very large number of records coming from a database (hundreds of millions of rows). The data is coming into the job correctly sorted and partitioned and the pipeline is streamlined with no stages that would break the pipeline.
I need to add some processing (a lookup, with rejected records that get re-inserted into the data stream using a sorted funnel stage) but if one of the links to the sorted funnel stage has no records the pipeline gets broken as all records get buffered at this point.
I can work around this problem by inserting a end-of-wave marker every n-records to allow processing to proceed in "chunks".
Unfortunately, towards the end of processing I have a transform stage which uses stage-variables to store cumulative values over many records; and this stage gets reset when it sees an end-of-wave marker. Also, in some copies of this job there is also a re-sort; where an end-of-wave marker would result in incorrect sorting. Both of these are unwanted effects in end-of-wave processing and lead to incorrect results.
What I am looking for is a way to remove the end-of-wave marker values prior to reaching these parts of the job. While there's a end-of-wave stage that allows us to insert markers into the data, there's none out there to allow one to remove those markers.
Before throwing this over the fence at IBM support I thought I might ask around here to see if someone might have a solution.
Can one remove an end-of-wave from a data stream?
Moderators: chulett, rschirm, roy
Can one remove an end-of-wave from a data stream?
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Premium Member
- Posts: 425
- Joined: Sat Nov 19, 2005 9:26 am
- Location: New York City
- Contact:
Hi ArnD,
I believe that is not possible to remove the end-of-wave marker once inserted in the stream, but in your scenario you could use LastRow() function to detect the end-of-wave signal in the transformer and then act on it
Regards
I believe that is not possible to remove the end-of-wave marker once inserted in the stream, but in your scenario you could use LastRow() function to detect the end-of-wave signal in the transformer and then act on it
Regards
Julio Rodriguez
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
ETL Developer by choice
"Sure we have lots of reasons for being rude - But no excuses
JDRodriguez - thanks for the hint about the LastRow(); but unfortunately the multiple-row handling necessary in that stage can require records that would have been grouped into another wave. I think I'll open up a call at IBM to see if the powers that be might have a solution.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
I've submitted the question to IBM and will see what they have to say.
I did do some tests with writing to a named pipe and then reading from the same named in the parallel job and that works to eliminate the End-Of-Wave markers, but for other reasons (this happens in a shared-container which is called in parallel and the fifo file names aren't known ahead of time and thus cannot be created with "mkfifo" beforehand) we can't implement this workaround so I'm hoping that IBM might have a suggestion.
I did do some tests with writing to a named pipe and then reading from the same named in the parallel job and that works to eliminate the End-Of-Wave markers, but for other reasons (this happens in a shared-container which is called in parallel and the fifo file names aren't known ahead of time and thus cannot be created with "mkfifo" beforehand) we can't implement this workaround so I'm hoping that IBM might have a suggestion.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
IBM has responded that there is no way to remove end-of-wave markers at present. I've submitted an RFE to add a corresponding counterpart to the Wave-Generator stage, but if they opt to include such a stage in the future it will most likely take quite a while before it is added to the product.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>