DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
dsdoubt
Participant



Joined: 15 Jul 2006
Posts: 106

Points: 880

Post Posted: Fri Feb 29, 2008 12:52 pm Reply with quote    Back to top    

DataStage® Release: 7x
Job Type: Parallel
OS: Unix
Hi,
My current project setup is as follows.
Set of files need to be processed. Though each file is of less number of rows, large number of files need to be processed.
So We have set of Perl program to preprocess and get the files from different server and drop in datastage box. As soon as the files are dropped, the Datastge will start the process and finish all the dropped files. Once this is done, other segment of Datastge will be started. After all this, the initial perl will get anothre set of files to Datastage.
This takes a lot of time.
Since the perl calls the DS jobs, it need to maintain the list of file names and pass it as parameters. So cant do the first stage when the second stage is going on.
And more over, startup time for each job each time is around 5 Sec. Where as production is just 2-3 second.
So is there a way to make Datastage wait or listen to a port or directory always, and as and when the file comes, make DS job to run. Will it work if we use named pipe options.
I guess there is some functionlity in Version 8 right.
kcbland

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 15 Jan 2003
Posts: 5209
Location: Lutz, FL
Points: 39192

Post Posted: Fri Feb 29, 2008 1:14 pm Reply with quote    Back to top    

Using a named pipe is great, you'll have to deal with timeout situations. Folks sometimes periodically send a "heartbeat" row to a pipe to keep it live. I think this is too much gimmick but that's my opinion.

I suppose concatenation of files is not an option? This would give a larger block of processing to give more credence to the micro-batch approach.

I personally think a staging database helps out in these situations much better. You can be appending rows to the table as you're reading rows out. Rows can be updating with a status indicating it's been "inducted", "processed" or "rejected". Your micro-batches get larger because they can span multiple files (now just rows within the table). You gain a significant amount of functionality (retry, audit, elasticity in staging).

_________________
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rate this response:  
Not yet rated
dsdoubt
Participant



Joined: 15 Jul 2006
Posts: 106

Points: 880

Post Posted: Fri Feb 29, 2008 1:26 pm Reply with quote    Back to top    

Thanks for reply.
Is it any dummy row that you refering to as HeartBeat. So we should reject that row with some conditional check isn't?
So if we have a Name pipe and make the job to listen to that pipe, will the job always be running, even when the data is not available in that pipe.
Coz, the data from the previous stage may be be accumulated at all time. Will be available at span of time.
Like the files will be avilable from morning to evening with some time period. Each file will be processed by each job within seconds. After that, the job will be idle.
Rate this response:  
Not yet rated
kcbland

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Server to Parallel Transition Group

Joined: 15 Jan 2003
Posts: 5209
Location: Lutz, FL
Points: 39192

Post Posted: Fri Feb 29, 2008 1:51 pm Reply with quote    Back to top    

You get the idea... Very Happy

_________________
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54594
Location: Sydney, Australia
Points: 296048

Post Posted: Fri Feb 29, 2008 3:48 pm Reply with quote    Back to top    

You might consider using server jobs, or parallel jobs with a low degree of parallelism, to keep the startup time as short as possible. Another possibility is an "always running" job using WISD t ...

_________________
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Rate this response:  
Not yet rated
dsdoubt
Participant



Joined: 15 Jul 2006
Posts: 106

Points: 880

Post Posted: Fri Feb 29, 2008 11:09 pm Reply with quote    Back to top    

Is it the part of Version 8?
Is there any documentation avaialable regarding the functionality and performance boost that we get in V8 if we upgrade?
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 54594
Location: Sydney, Australia
Points: 296048

Post Posted: Sat Mar 01, 2008 2:46 am Reply with quote    Back to top    

What makes you think that you'll get a performance boost? Indeed, how do you define "performance" in such a vague context. You certainly get a functionality boost - quite a few new toys and a co ...

_________________
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours