Improving Job performance

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

LD
Premium Member
Premium Member
Posts: 32
Joined: Thu Oct 21, 2010 9:03 am

Improving Job performance

Post by LD »

Hello All,


Following is the pictorial view of an under performing job I have:
Image
Green-> Hashed Files
Orange-> Transformer
Red -> OCI Stage

Info about jobs:
---------------------------------------------------------------------------
OCI:

Source count records < 400K
Array Size in OCI =1000, Transaction size =0, Transaction handling 10k

---------------------------------------------------------------------------
HASHED FILES:

All Hashed files Type 18
'Allow Stage write cache' is enabled, Modulus is 1, Separation is 3
Hashed file key column varies from 1 to 3
Private caching enabled, cache size (256MB)
Data in Hashed files is not growing; max row in Hashed file is < 2k
Hashed files have only required column and column width is defined as required.

---------------------------------------------------------------------------
TRANSFORMER:
Four custom routines are used for data quality check on ~50 source columns

---------------------------------------------------------------------------
SERVER:

Memory Size: 29696 MB
Good Memory Size: 29696 MB

Number Of Processors: 4
Processor Clock Speed: 1648 MHz
CPU Type: 64-bit
Kernel Type: 64-bit
---------------------------------------------------------------------------


Job performance is 50rows/sec. What can I do to improve the job performance


Let me know if you guys need any other info.

-
LD
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

First question in my mind is what Update Actions are you using in your target OCI stages? You could also determine if the lookups are a bottleneck by putting an @FALSE contraint in the last transformer for all of the output links so all the prep work is done but nothing is written to the target tables. If it still seems slow then your hashed file lookups are at least one of the culprits, otherwise I'd suspect the target work.
-craig

"You can never have too many knives" -- Logan Nine Fingers
LD
Premium Member
Premium Member
Posts: 32
Joined: Thu Oct 21, 2010 9:03 am

Post by LD »

Hi Craig,

I am doing upserts(update then insert) in the target OCI.

I did removed the Target OCI with sequential file and upserts went to 90rows/sec from 50. But I don't think that's a performance boost.


-
Shailesh
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

All target OCI stages? If that's the case then your upserts don't seem to be the issue, the hashed lookups are. You said that 'cached writes' were enabled, which doesn't make sense with the design you've posted - did you mean cached reads?
-craig

"You can never have too many knives" -- Logan Nine Fingers
LD
Premium Member
Premium Member
Posts: 32
Joined: Thu Oct 21, 2010 9:03 am

Post by LD »

Yes all the OCI Stages. I even removed the source OCI with seq file.
And yes I meant preload to memory is enabled in Hashed files.


-
Shailesh
swapnilverma
Participant
Posts: 135
Joined: Tue Aug 14, 2007 4:27 am
Location: Mumbai

Post by swapnilverma »

could u shift some of look ups to oracle ( with SQL join ) ??
Thanks
Swapnil

"Whenever you find whole world against you just turn around and Lead the world"
LD
Premium Member
Premium Member
Posts: 32
Joined: Thu Oct 21, 2010 9:03 am

Post by LD »

Swapnil

The OCI lookup will make the process more slower.


-
Shailesh
gateleys
Premium Member
Premium Member
Posts: 992
Joined: Mon Aug 08, 2005 5:08 pm
Location: USA

Post by gateleys »

You could use an Interprocess buffer between the Transformers so that an otherwise single process be split among the CPUs. Let us know the result if you think it may result in an overall performance gain.
gateleys
swapnilverma
Participant
Posts: 135
Joined: Tue Aug 14, 2007 4:27 am
Location: Mumbai

Post by swapnilverma »

I ment SQL joins instead of HASH lookup... if columns are indexed it will be faster
--------------------------

Swapnil

The OCI lookup will make the process more slower.


-
Shailesh
Thanks
Swapnil

"Whenever you find whole world against you just turn around and Lead the world"
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Have you enabled inter-process row buffering? Try with more transformer stages, with no more than four or five reference inputs each.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
arunpramanik
Participant
Posts: 63
Joined: Fri Jun 22, 2007 7:27 am
Location: Kolkata

Post by arunpramanik »

Is there any complex query with condition at the source OCI?
How fast the data is being retrived from source database?


You are doing insert and update at the same time. Is your target table are properly indexed on key columns particularly for update process? I will prefer to separate OCI stage or even a separate jobs for insert and update.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

In order to determine what can be done to "tune" this job, you need to establish some reference values first. Start off with just your source OCI stage and write straight to a sequential file at /dev/null; i.e. a copy of this job with just 2 stages. What is your Rows/Second speed for this job? This will be your absolute maximum attainable speed. If it isn't as fast as you expected, you will have to tune your query or speed up your database or transfer less data or a combination of those three changes.
Once that is done, implement the changes in the original job. What is the throughput now? Has it changed at all? If the OCI read speed was acceptable, then you need to find out if your OCI write speed is acceptable. Make a copy of the job and write to a flat file instead of OCI. What is the throughput? Now write a job that reads the sequential file and writes to OCI. What is that throughput?
arunpramanik
Participant
Posts: 63
Joined: Fri Jun 22, 2007 7:27 am
Location: Kolkata

Post by arunpramanik »

Plus there are so many target tables. Looks like some table may have foreign key references between themselves. Is the data load is properly sequenced like first master then the details?
LD
Premium Member
Premium Member
Posts: 32
Joined: Thu Oct 21, 2010 9:03 am

Post by LD »

@gateleys Thanks for your suggestion, speed increased from 90 to 140 using inter process. 140 is with all seq file as source and target. I'll post the numbers after replacing the source and targets with OCI stages.

@Swapnil I'll try your suggestion

@Ray yes the inter-process row buffering is enabled. I'll try with more transformers and will post the results here is sometime.

@Arun As this is Upsert job I drop index before this job starts and recreate index later.

@ArndW I have tried removing all OCI and replacing them with seq file. the speed increased from 50 to 90.
Using the Inter process increased the speed to 140row/sec
I'll get the maximum attainable speed as you have suggested.


-
Shailesh
LD
Premium Member
Premium Member
Posts: 32
Joined: Thu Oct 21, 2010 9:03 am

Post by LD »

Hi All,

I tried few suggestions and here are the outcome:

#Max attainable speed using select and dumping data in seq file is 5.5k row/sec
Max attainable speed using sqlldr 'direct load' from is >10k row/sec.
I have a question on this, why is there almost 4k row/sec drop when using datastage ? Can the throughput like sqlldr in datastage ?

#Swapnil I tried to include most of the lookups in select query; but the select was becoming more complex. Though I included 2 of the lookups in sql joins but nothing significant gain in performance.

#Next I'll try adding more transformers so that number of lookups per Transformer can be reduced.

-
Shailesh
Post Reply