Assign a Unique key for the rows coming from the source

UAUITSBI · Post by **UAUITSBI** » Wed Feb 25, 2015 5:53 pm

Hello,

How can I assign a unique number to an incoming row when there are several transformers running in parallel as displayed in the below embedded image. The unique number needs to be created for the rows coming from sequential files NOT after transformers as those are performing pivoting of data. The metadata of the sequential files are different.

[img]
http://postimg.org/image/bm4aawt4j/
[/img]

I tried implementing surrogate key generator stage to create a state file and then placed another transformer before the current transformer and used the NextSurrogateKey() in the transformer, as several transformers are running in parallel I couldn't implement this solution.

Also I tried using the below formula:

Code: Select all

(@INROWNUM - 1)*@NUMPARTITIONS+@PARTITIONNUM+1

Again due to parallelism of the transformer it is not working. I have found solutions here when there is just one transformer but not more than one.

Note: Once this job loads the target table, next job kicks off which also loads the same target table so the unique number should be continued from the previous job. Probably in the instance where we create a state file using the surrogate key stage should take care of this but want to mention it.

Any help would be appreciated.

Thanks.

SURA · Post by **SURA** » Wed Feb 25, 2015 10:06 pm

The consistent way to do this by getting the value from any table is my experience. I used SK file and it got corrupted few times. I would prefer to do it via table.

UAUITSBI · Post by **UAUITSBI** » Wed Feb 25, 2015 11:48 pm

Hi Sura,

Thanks for the reply. Problem here is I don't have any table to deal with where in I can use the power of RDMS I have to deal with sequential file unfortunately.

chulett · Post by **chulett** » Wed Feb 25, 2015 11:48 pm

Be aware that the images below the linked image could be considered NSFW depending on what randomly shows up for you, so use caution. It's basically just six flat file sources, each with a transformer leading to a funnel.

qt_ky · Post by **qt_ky** » Thu Feb 26, 2015 7:27 am

I am curious why couldn't you assign a unique number immediately after the funnel stage?

UAUITSBI · Post by **UAUITSBI** » Thu Feb 26, 2015 8:54 am

Hi qt_ky,

The data belongs to a survey, so I am pivoting the data as soon I get it from the source, as I have another analytical stuff that I need to perform. The reason I am trying to assign a unique ID before pivoting inorder to track my source row, meaning:
Below is the survey record from the source:

Code: Select all

SURVEY   SITE_NBR    QUESTION    RESPONSE1   RESPONSE2    RESPONSE3
 ATT           1234          Satisfied?        Yes                 No                Average
 ATT           1234          Satisfied?         Yes             Average              No

I pivot the above rows to be in the below format:

Code: Select all

SURVEY   SITE_NBR    QUESTION    RESPONSE1
SURVEY   SITE_NBR    QUESTION    RESPONSE2 
SURVEY   SITE_NBR    QUESTION    RESPONSE3

So at this point I am loosing the track of original response, as at a given site different people can respond to a question either in a same way or different.

Code: Select all

ATT    1234    Satisfied?    Yes
ATT    1234    Satisfied?    No
ATT    1234    Satisfied?    Average
ATT    1234    Satisfied?    Yes
ATT    1234    Satisfied?    Average
ATT    1234    Satisfied?    No

So If I assign a unique number to the source response I will be able to do different analytics in further processing such as how many completed surveys we received, how many are incompletely so on and so forth...

UAUITSBI · Post by **UAUITSBI** » Thu Feb 26, 2015 9:15 am

I thought of couple of ways doing it a traditional way and "not so" traditional way any insight on these would be helpful:

Method 1:

In a new parallel job,
1. Get a RowNumber (Row Column property) from sequential file stage for each source file and name of the file (File Name property).
2. Funnel all the source files as at this point the metadata would be the same (FileName, RowNum).
3. Use Aggregator stage to get the Max RowNum for each source file.
4. Use Transformer, Stage variables -- create as many as source files.
5. In derivation for Stage variables, first stage variable will have max RowNum of first file. Second stage variable will be (First Stage variable + 1), Third would be (2nd stage variable + 1) so and so forth.
6. After this find a way to send values of this file to next job and use the RowNum of respective files in their transformers as an "Initial Value" for NextSurrogateKey().

I am not sure on how to implement the #6.

Method 2:

I know this is a dirty way of doing it but it will be quick:

1. In each Transformer assign a unique code say "T1" for first transformer, "T2" for second etc..
2. Concatenate the NextSurrogateKey() to each one of these"

Code: Select all

  "T1":NextSurrogateKey()
  "T2":NextSurrogateKey()

This will help in getting the uniqueness (unique ID) to the rows.

SURA · Post by **SURA** » Thu Feb 26, 2015 4:21 pm

In one of the project where i worked, they have decided to have an unique SK for the whole EDW (I am not sure why?) .

We have created a sequence in the DB2 and used in the SK stage across most of the jobs.

DSXchange

Assign a Unique key for the rows coming from the source

Assign a Unique key for the rows coming from the source

Re: Assign a Unique key for the rows coming from the source

Re: Assign a Unique key for the rows coming from the source