single lookup or multiple lookup

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

single lookup or multiple lookup

Post by dsscholar »

Hi all,

i have a job in which i use six lookups with each oracle enterprise stage to do the join. But the record count is very less in each stage. So one lookup stage itself will do fine. but any performance problem if i use 6 lookups where 1 is sufficient. I know for big reference data in all 6 tables we have to go for separate lookups but here i used 6 separate ones. All auto partitioned. Any performance difference will be there? Should i change to one lookup in case of less data in all the six tables or nothing wrong if i proceed with 6 lookups.

Thanks in advance.
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

Re: single lookup or multiple lookup

Post by dsscholar »

Hi ray,chulett,jwiles

Please help regarding this query.

Thanks in advance
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Just for the record, I have a pet peeve - when someone calls out specific people when they post. There are upwards of 30,000 people registered here and any one of them can help.

Also, have some goram patience. You posted at 1AM my time and then at 4AM bumped it again. Sheesh.
-craig

"You can never have too many knives" -- Logan Nine Fingers
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

Post by dsscholar »

I accept your point. But if you had answered the query along with this means, it would have been better.


Thanks in advance
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Try it both ways.
Choose a job you love, and you will never have to work a day in your life. - Confucius
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

5:14pm and 8:10pm in my time zone.
Do you require that we be on line 24x7? How much would you be prepared to pay for that?!!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

It's very difficult to answer performance questions such as this. There are many factors that can influence job performance, both within and outside of DataStage and Information Server. We don't know your system and we don't know your data and we don't know your process. How can we give you a 100% correct answer for your situation?

Which design will work better for you? I don't know. You probably won't see much difference for the scenario you described. What is important is that either design is equally valid and will work. Eric gave the best suggestion: try it both ways That way you will know for certain in your environment. Then, if there are noticeable differences, explore further...where's the difference: startup? processing? Next thing you know, you're doing performance tuning...

I have to agree with Craig's comment: Don't call us out by name to solicit answers...give everyone here a chance. If it's "urgent-gotta-have-an-answer-within-3-hours-at-4am", call your official support organization.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

Post by dsscholar »

I understand your point. Apologises for that. I dint say its urgent. I just asked help me in this query. I wont use names anymore. But sometimes questions goes unnoticed as so many posts are there in the queue. So i just send a reminder in common.

Regarding the query

I will check it out. Please tell the following understanding is correct.

Adding five more lookup stages will need extra resources or not? Certainly there will be right? But considering the pipelining concept if we have separate lookups it will process the records faster. I run in 4 nodes. For all the six reference tables,there are totally 50 records with 7 columns (integer), which in future may reach max of 100 records. My question is, addition of lookups result in what, regarding resources and players and memory. Please explain this, then i will conclude according to my scenerio.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

If some of the tables are not so big and have relations (PK-FK) , then use join in query level and the rest using Datastage.

DS User
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

Post by dsscholar »

source & references are different dbms.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

Better do it is DS level and you wont find much impact if those tables are indexed.

Do lookup and write it in a Dataset file. In the next job load it will give better result.

DS User
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Sometimes questions will go unanswered...that's the nature of this type of forum. We're all volunteers and answer if we have time and answers that might be helpful.

Yes, adding additional stages requires more resources, as does a single stage with multiple reference tables (and hence lookups). For multiple lookup stages, the engine may combine them if they're next to each other and operator combination is not disabled, at which point it's effectively the same as a single stage. Each running process (player) requires system resources (memory/communication links/buffers/etc), hence the operator combination options. Can't be effectively taught in a single thread. If you haven't, I suggest you read the IBM Redbooks on DataStage Performance Tuning and Dataflow Design. There are several links to them in the forum.

You'll have an amount of memory used for each table, however in your scenario the total number of reference records is too small for this to be a problem. If you're using individual lookup stages, and they're not combined, you may see some benefit from pipeline parallelism. Again, test to see how your system responds with your overall job design. You may find that a downstream process in your job negates any efficiency upstream (that is the job will run no faster than it's slowest part).

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

jwiles wrote:Sometimes questions will go unanswered...that's the nature of this type of forum. We're all volunteers and answer if we have time and answers that might be helpful.
Exactly. Unanswered, perhaps. Unnoticed? Not a chance.
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Sometimes we're even expected to do the work we're paid to do!
:roll:
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
dsscholar
Premium Member
Premium Member
Posts: 195
Joined: Thu Oct 19, 2006 2:45 pm

Post by dsscholar »

Thanks jwiles.

Hi all,

Where can i enable or disable operator combination option?
Post Reply