Hash Partition to lookup stages

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Hash Partition to lookup stages

Post by SURA »

Hi there

DS 8.5.01 64 Bit
OS Win2008 R2
DB SQL Server

I am trying to load the History data into prod and the job aborted due to the below error. Drive is having 1.5 TB space.

Code: Select all

Could not map table file "F:/IBM/Datasets/lookuptable.20111128.vfu1wld (size 653663296 bytes)": Not enough space 
Job design
2 source with join stage and then 5 lookup stages before load the data.

I managed to run the job by did the below steps.

1. Increased the no of nodes
2. Referred the diff drive scratch disk
3. Did hash partition to all the lookup stages (It was designed with Auto Partition).

Now the question is: For the first lookup if i choose hash and for the rest of the lookup did i get the same result in the target?

The reason for my doubt is,

First lookup need two columns from the source to do the lookup, say COLA, COLB. I did the HASH partition and selected COLA,COLB; that means hash partition will allocate the data in each nodes based on COLA, COLB.

Now If i select the same partition for the rest of the lookup, but i dropped the COLA,COLB after lookup 1 is done. In that case how the hash partition will handle the data?

Sorry, i can't visualize !!

It would be great of any one can help me in this!

Thanks
DS User
meet_deb85
Premium Member
Premium Member
Posts: 132
Joined: Tue Sep 04, 2007 11:38 am
Location: NOIDA

Post by meet_deb85 »

Please provide the information about the number of rows in your reference links for the lookup stage....
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

Yes that i an important point.

few lookup is having less than million and the rest are with millions.

DS User
paultechm
Participant
Posts: 27
Joined: Wed Jul 25, 2007 2:09 am

Post by paultechm »

Split the job as two separate jobs ,after join write into a dataset
BI-RMA
Premium Member
Premium Member
Posts: 463
Joined: Sun Nov 01, 2009 3:55 pm
Location: Hamburg

Post by BI-RMA »

Using Lookup-Stage for millions of rows is not a very good design, because DataStage will first try to keep the complete Lookup-Dataset in memory and will then swap data to disk.

You had better use Left-Join instead.
"It is not the lucky ones are grateful.
There are the grateful those are happy." Francis Bacon
Post Reply