DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
ArndW

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup

Joined: 16 Nov 2004
Posts: 16318
Location: Germany
Points: 92566

Post Posted: Thu Jan 15, 2015 8:54 am Reply with quote    Back to top    

DataStage® Release: 9x
Job Type: Parallel
OS: Unix
We've got a challenging problem in job design where we haven't figured out a solution which doesn't break the pipeline. I've narrowed down the basic problem into a simple example, pictured below. ...

_________________

Image
rschirm

Premium Poster



Group memberships:
Premium Members, Inner Circle

Joined: 13 Dec 2002
Posts: 27

Points: 203

Post Posted: Mon Jan 19, 2015 1:58 pm Reply with quote    Back to top    

Hello Arnd,

Hope things are going well with you.

How about instead of rejecting unmatched rows tell it to continue instead and feed all rows through the trx and evaluate the if the row had a match in the trx?

Rows would stay in the same order so would not have to be resorted or sort merged.

Rick
Rate this response:  
Not yet rated
ArndW

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup

Joined: 16 Nov 2004
Posts: 16318
Location: Germany
Points: 92566

Post Posted: Tue Jan 20, 2015 3:26 pm Reply with quote    Back to top    

Unfortunately that single transform I have in the reject is a placeholder for several stages, including one where we use a sequence to generate a new unique surrogate key. Thus we have to split the data streams somewhere, which means that we need to re-join them again elsewhere.

If I could just remove end-of -wave markers within a job then I could add them every n-rows at the source and only partially interrupt the pipelining by breaking into smaller chunks. And the "Example Transform" in the job contains aggregate-type functions which need to examine all the data rather than just chunks as defined by waves.

I'm still hoping to get a response from IBM support; so far I've only gotten one response which didn't even address the problem as it would seem the support person didn't quite understand what the issue was.

I should add that when I wrote that the job runs "very slowly" I mean that it would run in over 24H of processing on a 16CPU AIX box rather than in 2 hours. And this problem affects not one job but over 100 of them... There's some serious data being processed and without pipelining the performance becomes too slow to be able to go into production.

_________________

Image
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours