Best way to design load process

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
highpoint
Premium Member
Premium Member
Posts: 123
Joined: Sat Jun 19, 2010 12:01 am
Location: Chicago

Best way to design load process

Post by highpoint »

Hi,

We normally load, load ready files into db2 using db2 connector.Normally our files are around 10 million records.

If the load process fails say after 5 million records.Then we have to manually delete it using dba help. Dba takes long time to delete since he has to break into smaller sqls so that not to fill database logs

If we design jobs so that it does join with the target table and find out the already loaded records, but with this the problem is our load process gets slower even when we dont run into any database issues.

Please suggest best way to approach this.
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Pre-extract a list of just the key values from the target table into a Lookup File Set or Data Set, and perform Lookups against that.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

You don't show many details, so I'll take a shot in the dark here: In DB2 (and other relational databases) it is very poor practice to load a large number of rows to a table without periodic commits. It is very easy to design a process that automatically (as in because of checkpoint info) restarts with the next data item after the last commit.

You cannot avoid a performance hit if you want recoverability or restartability from the failure point. The number of rows at which a commit occurs is dependent on the processing strenght of your system and OS. Even high-end mainframes will impose a limit of only a few thousand, and even less if there are other jobs needing DB2 resources.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
kwwilliams
Participant
Posts: 437
Joined: Fri Oct 21, 2005 10:00 pm

Post by kwwilliams »

FranklinE wrote:. Even high-end mainframes will impose a limit of only a few thousand, and even less if there are other jobs needing DB2 resources.
Depends on whether he is referring to DB2 on the Mainframe on DB2 UDB, most people refer to both as DB2.
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Keith, good point.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
Post Reply