Generating a range of surrogate keys

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Generating a range of surrogate keys

Post by PhilHibbs »

We were using a Surrogate Key Generator, but this proved unreliable - the start of our range is determined externally and we have a block of values allocated to us, we need to generate the exact keys in that range.

I am trying with a Transformer stage set to run in Sequential mode, with stage variables but the job is freezing.

The Transformer does a Sort Merge on a 300-character input column to make sure the SKs are generated in a predictable order. In my test run this column just contains numbers, which when sorted should be 1, 10, 100, 1000, 10000, 10001, 10002, etc., and there are 100,000 records.

Any idea why my job would just stop with 100000 on the input link to the Transformer and 0 on the output? The only expressions in the Transformer are these Stage Variables:

svSeq (Integer) = If @INROWNUM = 1 Then pStart Else If svSeq = 999999999 Then 1 Else svSeq + 1
svDIN (BigInt) = DateToString(CurrentDate(),"%yyyy%mm") : Right( '0000':pDataSource,4) : Right( '000000000' : svSeq, 9 )

pStart and pDataSource are job parameters.

I just tried taking the sort off of the input link and now it gets as far as 862 rows on the input and 484 on the output, and then stops. The next stage executes in Parallel mode, has Auto partitoning.
Phil Hibbs | Capgemini
Technical Consultant
DSguru2B
Charter Member
Charter Member
Posts: 6854
Joined: Wed Feb 09, 2005 3:44 pm
Location: Houston, TX

Post by DSguru2B »

Run the entire job on single node and see if its sequential coupled with parallel is causing issues.
Creativity is allowing yourself to make mistakes. Art is knowing which ones to keep.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Based on your input having 100000 rows, it should stop at that number. You can send the output to a Peek stage to see any actual output as a test, and also add a Transformer stage reject link output to another Peek stage. See if something is causing rejects.
Choose a job you love, and you will never have to work a day in your life. - Confucius
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

DSguru2B wrote:Run the entire job on single node and see if its sequential coupled with parallel is causing issues.
I think it was this. I'm now running the stage in parallell, with Round Robin partitioning, and using @PARTITIONNUM and @NUMPARTITIONS to generate the keys.
Phil Hibbs | Capgemini
Technical Consultant
Post Reply