In-memory lookup

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ivannavi
Premium Member
Premium Member
Posts: 120
Joined: Mon Mar 07, 2005 9:49 am
Location: Croatia

In-memory lookup

Post by ivannavi »

I hardly ever do server jobs. But here it is. To make it easier I will throw some statements/conclusions of mine and ask for someone to confirm whether the following are true or false:

1) Using Oracle OCI stage that reads from a very small reference table as a source for the reference link to a transformer:
Even if I edit the OCI query not to use bind variables forwarded from the transformer (primary link), it will do roundtrips to the database issuing the same sql over and over for each row coming from the primary link.(true/false)?

2) The same thing described in 1) goes for ODBC and DRS stage. (true/false)?

3) This does not happen in parallel jobs with normal (not sparse) lookup mode (I have never had performance issues). (true/false)?

4) The only way to do what is called a normal lookup (as opposed to sparse lookup in parallel jobs) is to use hashed files for reference input. (true/false)?
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

When is the interview? We don't answer interview questions.
Mamu Kim
ivannavi
Premium Member
Premium Member
Posts: 120
Joined: Mon Mar 07, 2005 9:49 am
Location: Croatia

Post by ivannavi »

There really is not any interview. I'm on the same site I've been for the last four years. No need to be rude. I believe none of the employers ever heard of something called DataStage, let alone asking something about it in a job interview. I hoped I could ask a question here about features I have never used before. And I did my homework before I asked. Thus the multiple choice questions that I put much effort into. 99$ to get an insult from a DS guru?
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Sorry if I came off as rude but the goal of this forum is to educate.

1) Using Oracle OCI stage that reads from a very small reference table as a source for the reference link to a transformer:
Even if I edit the OCI query not to use bind variables forwarded from the transformer (primary link), it will do roundtrips to the database issuing the same sql over and over for each row coming from the primary link.(true/false)?

TRUE in server. Lookups in PX are in memory. To do a in memory lookup use a hashed file.

2) The same thing described in 1) goes for ODBC and DRS stage. (true/false)?

TRUE. Same as above.

3) This does not happen in parallel jobs with normal (not sparse) lookup mode (I have never had performance issues). (true/false)?

TRUE does not hit database. Sparse is same as server. Hits the database if not found in previous lookup.

4) The only way to do what is called a normal lookup (as opposed to sparse lookup in parallel jobs) is to use hashed files for reference input. (true/false)?

FALSE. No such thing as a hashed file in PX. A PX lookup is in memory.
Mamu Kim
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Just to help out a little bit, based on my (admittedly) limited PX knowledge:

1) Right, true. As noted, the Server equivalent of a 'in memory lookup' would be a cached hashed file reference lookup.

2) True, any DB stage when used as a lookup will make 'round trips' to the database across the network with each row... even when you don't have a Key expression.

3) True, but as I understand it there is an initial trip to the database to get all of the records needed by the lookup which are stored locally in a 'lookup file set'. Not sure how much of it after that is 'in memory' however.

4) Depends on what a 'normal' lookup is. If that means pull all of the reference data into a local structure, then I guess it would be true. False if you consider a local UV lookup a 'normal' lookup. I'm assuming you are equating a DB lookup in Server to a sparse lookup in PX.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Right, but I was specifically wondering what was being considered a 'normal' lookup in Server as opposed to PX.

As to your doubts, I've never seen any evidence that suggests that's how a database lookup works in Server. AFAIK it will always ping the database, even if the key value is the same as the last one. The assumption is the last set of values are cached in memory and I don't believe that to be the case for Server. PX perhaps, as it is more savy in ways of that nature, but not Server.
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

I've been told by various dedicated experts of the rdbms databases that we support (db2, oracle, etc.) that the rdbms' themselves have algorithms that are smart enough to do a degree of caching, but in Server there is nothing in particular that keeps the Stage from going back to the database each time.....and indeed, there are times when this is absolutely critical, because you want to ensure, as in a realtime Job, that you will get any "just committed" rows.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

To all of that I must add that DataStage uses "prepared SQL" so the original assertion, that
it will do roundtrips to the database issuing the same sql over and over for each row coming from the primary link
is actually false. The SQL Statement is sent when the stage starts, and the database server runs it through its optimizer and whatever else it needs to do - it may even build an in-memory table containing the query results (all keys) at that point. In a reference input, DataStage sends the key value (or an array of them) and the database server responds with the corresponding row (or an array of them). If no row matches, either the database server or (if not) the DataStage stage code generates a row of nulls.
Last edited by ray.wurlod on Mon Aug 17, 2009 4:52 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ivannavi
Premium Member
Premium Member
Posts: 120
Joined: Mon Mar 07, 2005 9:49 am
Location: Croatia

Post by ivannavi »

I appreciate all your replies and hereby pronounce this topic resolved.
Post Reply