Need help with sequence number generation

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

gayathrisivakumar
Premium Member
Premium Member
Posts: 60
Joined: Thu Dec 17, 2009 1:36 am

Need help with sequence number generation

Post by gayathrisivakumar »

I have a requirement to generate unique sequence number which will be of length 15 digits. The requirement is to generate the same number if primary key remains the same.

For example:

Key Sequence
100 000000000000001
100 000000000000001
100 000000000000001

200 000000000000002
200 000000000000002
200 000000000000002

This will be a daily process and whenever a record with a new primary key is received, a new sequence number should be generated.

Can somebody please help me with this?
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

So... your 'Sequence' is actually a string field? That would be the only way to store those leading zeroes if they are significant. I would handle this in stage variables, tracking the current v. previous version of the key field and only incrementing the sequence number stage variable when it changes. Then you'd need to convert it to a string and pad it with zeroes to 15 in the actual outgoing derivation of that field.

Using the Sort stage with the 'Key Change' option would help here. And make sure you handle the partitioning properly, hash on the Key so all of them end up on the same node... or run the transformer in sequential mode if your volume won't be negatively impacted by that.
Last edited by chulett on Mon Nov 18, 2013 2:29 pm, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Out of curiosity - is your unique 15-digit sequence number being stored in a database or a sequential file? If it is being stored in a database, what's the data type?
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

Andy's question begs another question: Why is the sequence number required? What will it be used for?

In rereading the opening post, it looks like a redundant identifier. I don't second-guess needing it, but if a unique key value will always be linked to a unique sequence number, then the simplest method is to create an algorithm to generate the sequence number from the key. In the example, it could be as simple as saying that all key values will have the related sequence values by repositioning the significant characters: Key 100 will always be sequence number 000000000000001; an extension of that would look like Key 1120 would generate sequence number 000000000000112.

Of course, if this is really just FIFO, then maintaining the last sequence number assigned would be the simplest method.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Keep in mind the fact that most examples of data posted here are completely made up. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

You take all the fun out of it, Craig. :P
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

I'm sure Craig doesn't set out to "take all the fun out of it", he just considers it an added bonus...

:D
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Dang, have you guys been reading my diary again? :x

OK, enough of that. Gayathrisivakumar, do you have enough information to get what you need done? If you need more help, please let us know.
-craig

"You can never have too many knives" -- Logan Nine Fingers
gayathrisivakumar
Premium Member
Premium Member
Posts: 60
Joined: Thu Dec 17, 2009 1:36 am

Post by gayathrisivakumar »

Thanks for your response.

Andy Sorrell - The sequence number will be stored in an oracle database and its a varchar field.

Also I forgot to mention that when the job runs, sequence should start from last value generated on the day before that.

I was also thinking of using stage variables as Craig suggested.But still I am not clear on how to initialize the value to '1' for day one and run the same process every day so that sequence will be incremented according to the change in Key value.

Please let me if anyone has any suggestions.

Thanks.
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

I don't remember this clearly, and you should look it up, but isn't there a function (pretty sure that it's in DB2, less sure of Oracle) that gives an inserted row a surrogate key, kept track of by the table itself?

I may be phrasing that badly, sorry.

Absent that, or if it doesn't fit the requirements, keeping a local file or database table with the input key and its sequence id seems the approach that makes sense. On each new day, use a lookup for each input key. If found, assign the existing sequence number. If not found, assign the next sequence number according to the requirements of how it is set.

You'll always have a one-to-one link on the lookup. The file/table will prevent assigning the same sequence to different keys. Doing it dynamically looks elegant, but it's not really needed.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Clarify something for us - will this DataStage job be the only process manging the Sequence value? Or will other DataStage jobs and/or outside processes be involved with inserting records into this target? If you need to worry about concurrency, that will complicate your solution.

And Franklin, you're thinking of a sequence object that the database can maintain and the NEXTVAL() function. Typically it would be used to provide a unique surrogate value to each record in a table but judicious use of one could help here if there is indeed a concurrency issue that needs to be dealt with.
-craig

"You can never have too many knives" -- Logan Nine Fingers
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Well, the VARCHAR requirement pretty well stops any chance for using any sort of Oracle Identity column directly. As far as I know those are limited to numeric datatypes. http://www.oracle-base.com/articles/12c ... ty-columns

You could use an Oracle Identity column indirectly. You'd just have to add an Oracle virtual column that derives the 15-digit, zero padded VARCHAR value from the numeric identity column.

I believe your two options are as follows:

1) Use DataStage surrogate key generator.
This requires an initialization program to set the starting value to "1". Surrogate keys will then need to be converted to varchar and left-padded with zeroes prior to sending it to Oracle. Please note that sequence numbers will have gaps in them though they will be unique. This also means that your DataStage update job will require a lookup or change capture mechanism to determine if a record is an insert, and add a new key value for those records.

2) Use an Oracle Identity Column with a derived virtual column that provides the required output format.

Personally, I like to let the database handle these kinds of things...
Last edited by asorrell on Tue Nov 19, 2013 11:12 am, edited 1 time in total.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
FranklinE
Premium Member
Premium Member
Posts: 739
Joined: Tue Nov 25, 2008 2:19 pm
Location: Malvern, PA

Post by FranklinE »

chulett wrote:And Franklin, you're thinking of a sequence object that the database can maintain and the NEXTVAL() function. Typically it would be used to provide a unique surrogate value to each record in a table but judicious use of one could help here if there is indeed a concurrency issue that needs to be dealt with.
"That's the man, officer! I'd know him anywhere." :wink:

This looks like an example of doing something in the database instead of in DataStage being a better option.
Franklin Evans
"Shared pain is lessened, shared joy increased. Thus do we refute entropy." -- Spider Robinson

Using mainframe data FAQ: viewtopic.php?t=143596 Using CFF FAQ: viewtopic.php?t=157872
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Wasn't thinking of using an Identity column (since those would be unique per record which is not the goal here) but literally a sequence that gets incremented conditionally. And then yes, you'd need to transform it into a padded string before actually using it in the records.

Still curious about concurrency.
-craig

"You can never have too many knives" -- Logan Nine Fingers
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

I assumed the example demonstrated getting multiple update records that would be applied to the same keyed database record. He identified the first field as the "Key" and then stated "whenever a record with a new primary key is received, a new sequence number should be generated".

If that isn't the case, then the identity column solution I recommended won't work.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
Post Reply