Page 1 of 1

Problem in performance

Posted: Fri Mar 16, 2007 8:54 pm
by marsboy
I had source like this

Key value Start_dt
1 100 01/02/2007 00:00:10
1 101 01/02/2007 00:00:15
1 100 01/02/2007 00:00:45
1 100 01/02/2007 05:05:10
2 900 01/03/2006 00:00:19
so any number of events can occur in day

so im only intersted if the value field changes from the previous value

so my target should have something like this

Key Value Start_Dt End_Dt
1 100 01/02/2007 00:00:00 01/02/2007 00:00:14
1 101 01/02/2007 00:00:15 01/02/2007 00:00:44
1 100 01/02/2007 00:00:45
2 900 01/03/2006 00:00:19

source and target are oracle database.

Currently i writing 1 record in database commiting it and comparing the second record if it has same key and value field changed then i updating the 1 record with 2nd record start_dt - 1 second and writing the second record into the table.

It is taking for ever for me to finish my job due to huge volume of data


SO there is anyother way to do this type of things within less time.

Posted: Fri Mar 16, 2007 11:31 pm
by chulett
Welcome to Earth, marsboy! :D

Are you familiar with using hashed files in a DataStage Server job? Doing things directly like this in Oracle is the slowest way you could have chosen, as you are coming to realize. A typical job design to do what you want would leverage hashed files for speed and go something like this:

1) Existing keys and associated values written to a hashed file.

2) Source data is read in and a 'reference lookup' done to that hashed file.

3) No hit = new data. Send down an insert link to Oracle and write the new data to the reference hashed file.

4) Hit = old data. Compare the data found with the data coming in and check for changes. If changes are found, send things down both an update link to Oracle for the 'old' record and the new information down the same insert link as above. Lastly, write the changed data back to the reference hashed file.

At a very high level, that's it. Incoming records where data already exists and no change was detected can typically just be discarded. The reference lookup hash should not be cached so each newly written record can be immediately lookup up. No need to commit each record in Oracle then, either.

Probably very similar to what you (I assume) are doing with an OCI lookup and a transaction size of 1, minus the hashed file of course.

There are more subtleties to it, but I'm curious - have you ever done anything like that before? Seen jobs built like that? Proper use of Hashed Files is one of the keys to building performant Server jobs.

Posted: Sat Mar 17, 2007 7:46 am
by ray.wurlod
Take the IBM beginner's class (DataStage Essentials, code DX334). Among other things it takes you through this technique.