LOOK UP PERFORMANCE TUNING WITH A BIG SQL SERVER TABLE

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
sql.alp
Premium Member
Premium Member
Posts: 7
Joined: Tue Jun 10, 2014 5:20 pm

LOOK UP PERFORMANCE TUNING WITH A BIG SQL SERVER TABLE

Post by sql.alp »

HI GUYS!
i thank you all even now for your responses!
i am fairly new to DS, and i am having issue with a job that checks the existing data in a SQL SERVER 3 table joins. (A, b,c) only one of the table is big sized (A). and there is very small table in NETEZZA (nz). i need to insert/update a field in A by nz. All values in nz need to be upd/ins to A. in order to do that i need to check existance of nz.date field in A.date. i thought look up would be a good choice to do that. Im sure you all know what kind of misery i had by matching date values btwn NZ to SQL server.
my problem is:
because A has some size +joins +converting date field +sql server performance :), the look up query takes some time even though nz is a very small able (=sometimes max 300 rows mostly less), and sql server times out. at the end i get error saying "socket unplugged to ODBC server"
SO I NEED TO FIND A WAY TO MAKE MY LOOK UP MORE EFFICIENT!
i am open for any ideas.
and i thank you all!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are the joins supported by indexes? Is the lookup field indexed?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sql.alp
Premium Member
Premium Member
Posts: 7
Joined: Tue Jun 10, 2014 5:20 pm

Post by sql.alp »

Thx for the response so quick,
Well they all have PKs. The reason for 3 join is nz to reach the A's PK by c then b. I know it's been set up inefficient but I have no control over A b and c tables which are in SQL server. At the end I will have to update A with all the values in nz. The look up is for filling up 1 field from nz 3 fields from A and I have to match by the date field which exist in both tables. If the nz.Date value is not in A.Date then I have to insert the nz.Date and other field along with matching value from A. (Like week of the year field on A)

Btw, I have tried flat file destination instead of ODBC destination and the jobs executes finely (takes little time but it does finish update/insert (like 5 min for 324 rows in nz)).

My best bet would be:
1) either using another methods like joins or using 2 look ups instead of 1 look up?
2) or (it is little detailed work) I could maybe capture the min value from nz.date, make a variable and then I can use that variable in where clause in the look up sql query? If I can do that that would cut off a huge chunk of unnecessary data from A. At the end nz is kinda an update table for A and the dates are usually very closed the current.

Can I do 2nd theory? Or is there a better way then look up that matches the data faster?
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

:idea:

Why not try it?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hvve you considered using a hashed file to cache the lookup data rather than reading directly from the database? They are the heart and soul of Server jobs, after all. The 'trick' is to load into them only what you need for each job run which typically involves joining the incoming keys to that data set.

Now, this may not be worth it if your runs are as small as you noted, will they be larger at some point?
-craig

"You can never have too many knives" -- Logan Nine Fingers
sql.alp
Premium Member
Premium Member
Posts: 7
Joined: Tue Jun 10, 2014 5:20 pm

Post by sql.alp »

My etl background is SSIS and SQL server based(believe me i got into some transitions to become an IBM developer :) ) and i used to create a file rather than looking directly into the DB. Plus, it does back n fort for every single data.
as i mentioned earlier i'm new to DS but i mentioned anything about it. i guess my colleague does not use the hash file that much so he said this is how he uses the look up.

i believe what you are saying should be the missing link!

Correct me if im mistaken;
1- i use the join statement first and load into the hash file (ill call it H1)
2- then H1 is the one with the transformer stage look up as reference (since it will be the bigger size) and nz as the stream link,

Q1:any corrections or any other suggestions on this strategy?

Q2: is it also better to use another hash file (H2) just before the load into A?

Q3-i have to put in consideration that A is a few hundred thousand row table. so will a hash file be able to handle all data or should i still eliminate some rows the way i have mentioned in my earlier post?

thank you all!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

You are NOT mistaken. :wink:

Q2: Not quite sure... what would you use another hashed file for?

Q3: A hashed file will easily handle that load. Best to set the initial size (Minimum Modulus) to something large so it doesn't immediately start to extend and fragment. If it takes 'too long' to build or becomes 'too large' (2.2GB max) then look into constraining the source SQL for the hashed build.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sql.alp
Premium Member
Premium Member
Posts: 7
Joined: Tue Jun 10, 2014 5:20 pm

Post by sql.alp »

Q2: is it more efficient to use another hash file that captures the data from output of look up and direct it to final ODBC inst/updt stage?

Q4: what would be the best Parameter array size and Rows per transaction value on upsert stage for SQL server?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

A2: No.

A3: Too many variables for a generic answer and most are dependent on your data and hardware. You might want to discuss that with your SQL Server DBA and see if they can help. Do you understand what they control?
-craig

"You can never have too many knives" -- Logan Nine Fingers
sql.alp
Premium Member
Premium Member
Posts: 7
Joined: Tue Jun 10, 2014 5:20 pm

Post by sql.alp »

Q4 : well it was set up to 0 as default, and it was getting timed out error on sql server 324 rows upsert. then i set it up for array= 1, transaction rows =1 then it executed. but still it took about 5 min!

data type = smallint
total row= 300-500 max (usually less)
table to upsert= only PK index, no triggers!
hardware= source table = NZ
dest. table = sql server 2012
etl= DS 9.1 version, job type= server job
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Array size = how many records are sent over the network to the database at a time.
Transaction size = controls commit in the target database based on row count.

Setting both to 1 would guarantee the slowest load possible. :wink:

When you pick new numbers, make sure the transaction size is an even multiple of the array size. I'm assuming you meant the transaction size was zero, that tells it to only do a single commit on 'end of data'. Zero is an invalid array size.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sql.alp
Premium Member
Premium Member
Posts: 7
Joined: Tue Jun 10, 2014 5:20 pm

Post by sql.alp »

wow! what a difference! it executes now way faster and i dont even have to put where clause to eliminate the unnecessary rows! even it selects all columns, still it is 5 times faster! unbelievable :D

thank you All Guys! your expertise on DS solved another issue! without you, IBM's limited source would make the developers devastated i guess!
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I'll have to disagree with your comment about IBM support - in my experience they are a highly skilled and highly motivated group, whose unenviable task it is is to solve even those problems that we are unable to.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sql.alp
Premium Member
Premium Member
Posts: 7
Joined: Tue Jun 10, 2014 5:20 pm

Post by sql.alp »

im sure they are experts and experienced. and i am figuring that it is better for IBM developer's community. my quote was based on comparison with microsoft resource online! Everyone says, i personally experienced, it is very easy to find anything for any question you have about Microsoft! which makes it a lot better for IBM developers i assume for the reason they will have (or at least i will hopefully have soon :D ) exceptional skills that not everyone can get as easy as a microsoft developer!
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'll say one thing, we can certainly be... quicker... at times. Been there, done that, got the t-shirt kind of stuff. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply