Assign a Unique key for the rows coming from the source

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
UAUITSBI
Premium Member
Premium Member
Posts: 117
Joined: Thu Aug 13, 2009 3:31 pm
Location: University of Arizona

Assign a Unique key for the rows coming from the source

Post by UAUITSBI »

Hello,

How can I assign a unique number to an incoming row when there are several transformers running in parallel as displayed in the below embedded image. The unique number needs to be created for the rows coming from sequential files NOT after transformers as those are performing pivoting of data. The metadata of the sequential files are different.

[img]
http://postimg.org/image/bm4aawt4j/
[/img]

I tried implementing surrogate key generator stage to create a state file and then placed another transformer before the current transformer and used the NextSurrogateKey() in the transformer, as several transformers are running in parallel I couldn't implement this solution.

Also I tried using the below formula:

Code: Select all

(@INROWNUM - 1)*@NUMPARTITIONS+@PARTITIONNUM+1
Again due to parallelism of the transformer it is not working. I have found solutions here when there is just one transformer but not more than one.

Note: Once this job loads the target table, next job kicks off which also loads the same target table so the unique number should be continued from the previous job. Probably in the instance where we create a state file using the surrogate key stage should take care of this but want to mention it.

Any help would be appreciated.

Thanks.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Re: Assign a Unique key for the rows coming from the source

Post by SURA »

The consistent way to do this by getting the value from any table is my experience. I used SK file and it got corrupted few times. I would prefer to do it via table.
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
UAUITSBI
Premium Member
Premium Member
Posts: 117
Joined: Thu Aug 13, 2009 3:31 pm
Location: University of Arizona

Re: Assign a Unique key for the rows coming from the source

Post by UAUITSBI »

Hi Sura,

Thanks for the reply. Problem here is I don't have any table to deal with where in I can use the power of RDMS I have to deal with sequential file unfortunately.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:!: Be aware that the images below the linked image could be considered NSFW depending on what randomly shows up for you, so use caution. It's basically just six flat file sources, each with a transformer leading to a funnel.
-craig

"You can never have too many knives" -- Logan Nine Fingers
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

I am curious why couldn't you assign a unique number immediately after the funnel stage?
Choose a job you love, and you will never have to work a day in your life. - Confucius
UAUITSBI
Premium Member
Premium Member
Posts: 117
Joined: Thu Aug 13, 2009 3:31 pm
Location: University of Arizona

Post by UAUITSBI »

Hi qt_ky,

The data belongs to a survey, so I am pivoting the data as soon I get it from the source, as I have another analytical stuff that I need to perform. The reason I am trying to assign a unique ID before pivoting inorder to track my source row, meaning:
Below is the survey record from the source:

Code: Select all

SURVEY   SITE_NBR    QUESTION    RESPONSE1   RESPONSE2    RESPONSE3
 ATT           1234          Satisfied?        Yes                 No                Average
 ATT           1234          Satisfied?         Yes             Average              No
I pivot the above rows to be in the below format:

Code: Select all

SURVEY   SITE_NBR    QUESTION    RESPONSE1
SURVEY   SITE_NBR    QUESTION    RESPONSE2 
SURVEY   SITE_NBR    QUESTION    RESPONSE3

So at this point I am loosing the track of original response, as at a given site different people can respond to a question either in a same way or different.

Code: Select all

ATT    1234    Satisfied?    Yes
ATT    1234    Satisfied?    No
ATT    1234    Satisfied?    Average
ATT    1234    Satisfied?    Yes
ATT    1234    Satisfied?    Average
ATT    1234    Satisfied?    No
So If I assign a unique number to the source response I will be able to do different analytics in further processing such as how many completed surveys we received, how many are incompletely so on and so forth...
UAUITSBI
Premium Member
Premium Member
Posts: 117
Joined: Thu Aug 13, 2009 3:31 pm
Location: University of Arizona

Post by UAUITSBI »

I thought of couple of ways doing it a traditional way and "not so" traditional way any insight on these would be helpful:

Method 1:

In a new parallel job,
1. Get a RowNumber (Row Column property) from sequential file stage for each source file and name of the file (File Name property).
2. Funnel all the source files as at this point the metadata would be the same (FileName, RowNum).
3. Use Aggregator stage to get the Max RowNum for each source file.
4. Use Transformer, Stage variables -- create as many as source files.
5. In derivation for Stage variables, first stage variable will have max RowNum of first file. Second stage variable will be (First Stage variable + 1), Third would be (2nd stage variable + 1) so and so forth.
6. After this find a way to send values of this file to next job and use the RowNum of respective files in their transformers as an "Initial Value" for NextSurrogateKey().

I am not sure on how to implement the #6.

Method 2:

I know this is a dirty way of doing it but it will be quick:

1. In each Transformer assign a unique code say "T1" for first transformer, "T2" for second etc..
2. Concatenate the NextSurrogateKey() to each one of these"

Code: Select all

  "T1":NextSurrogateKey()
  "T2":NextSurrogateKey()

This will help in getting the uniqueness (unique ID) to the rows.
SURA
Premium Member
Premium Member
Posts: 1229
Joined: Sat Jul 14, 2007 5:16 am
Location: Sydney

Post by SURA »

In one of the project where i worked, they have decided to have an unique SK for the whole EDW (I am not sure why?) .

We have created a sequence in the DB2 and used in the SK stage across most of the jobs.
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
Post Reply