accesing data without importing table definition in DS

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
ak77
Charter Member
Charter Member
Posts: 70
Joined: Thu Jun 23, 2005 5:47 pm
Location: Oklahoma

accesing data without importing table definition in DS

Post by ak77 »

Hello everybody,

I made a change to an existing job
This job has a OCI stage which fetches data from the join of 2 tables
This took like 5 hrs for me to run
I added another table making this a three table join but forgot to import the table definition of this new table
FYI, i am using a User_defined SQL

this one took near about 20 hrs

Was breaking my head why this was hapening as this table is much smaller when compared to others

Atlast found this mistake when trying to break the job into smaller modules

Now the question is how it affects the processing if the table definition is not in the DS-Repository?

Any insight will be appreciated?

Thanks
Kishan
Bala R
Participant
Posts: 66
Joined: Mon May 30, 2005 9:52 pm

Post by Bala R »

Before that, did you have a chance to estimate the cost of the query using toad or quest central when you added a third table to the join?
Bala R
Participant
Posts: 66
Joined: Mon May 30, 2005 9:52 pm

Post by Bala R »

Before that, did you have a chance to estimate the cost of the query using toad or quest central when you added a third table to the join?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: accesing data without importing table definition in DS

Post by chulett »

ak77 wrote:Now the question is how it affects the processing if the table definition is not in the DS-Repository?
It doesn't. Look elsewhere for your performance difference.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

To expand on Craig's answer, the table definitions are really just a design tool, when the job is running it only looks at the local job column definitions. You need to compare the cost on the user defined SQL of adding the third table, whether database optimisation will improve this SQL, or whether the third table needs to be added to the job as a hash file lookup instead.
ak77
Charter Member
Charter Member
Posts: 70
Joined: Thu Jun 23, 2005 5:47 pm
Location: Oklahoma

Post by ak77 »

Thanks for the Immediate response Bala,

I didnt do any cost estimation before the first run

Since this job was taking long time to finish, I was running the estimate plan to find where I was having problem
Then I was trying to change the design to accomodate this table when i found this mistake

Now it seem to be running faster when I have the table definition in the repository

I was just wondering how this works

Thanks

Regards,
Kishan
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

When the job starts, the stage sends the SQL (whether generated or user-defined) to the database server to be "prepared". This means that the database server prepares its execution plan based on available indexes, current load and whatever other factors might be involved. A request is then sent to the database server to execute the prepared SQL. How long this takes is totally and solely dependent on the SQL query itself; whether the table definitions are in the repository or not makes no difference whatsoever. The database server stores the result set from the query, from which the stage fetches one row (or N rows, depending on buffering) at a time.

I think your observation was caused by other factors, perhaps a lighter load on the machine at the time when you ran with table definitions having been imported.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Some database engines optimise queries that are used more then once and you can find it runs a lot faster with repeated use.
ak77
Charter Member
Charter Member
Posts: 70
Joined: Thu Jun 23, 2005 5:47 pm
Location: Oklahoma

Post by ak77 »

Thanks everybody,

I got the cost from explain plan and tried to optimise the query but its still slow
I am dealing with two tables with 20 M records joined with a table with 300 records

Since this table is small, I will use a hash lookup rather than direct join

Thanks again
Kishan
srinagesh
Participant
Posts: 125
Joined: Mon Jul 25, 2005 7:03 am

Post by srinagesh »

Hi !

I tried doing the same mapping in two ways..

1. By using custom SQL for the Source
2. By using SQL builder for the source

It was a load with 1 lookup (hash file).
Source (oracle) table has 27,00,000 records.

It took considerably more time when I used Custom SQL than it took when I used SQL builder.

I checked the plan tables in both the cases and it was same and the Oracle server was running this job only.

It is bit strange to note the performance improvement when SQL Builder is used... :-)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Which test did you run first? What steps did you take to control for the effects of caching - that is, the required rows being cached somewhere in the database server?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply