Page 1 of 1
Assign a Unique key for the rows coming from the source
Posted: Wed Feb 25, 2015 5:53 pm
by UAUITSBI
Hello,
How can I assign a unique number to an incoming row when there are several transformers running in parallel as displayed in the below embedded image. The unique number needs to be created for the rows coming from sequential files NOT after transformers as those are performing pivoting of data. The metadata of the sequential files are different.
[img]
http://postimg.org/image/bm4aawt4j/
[/img]
I tried implementing surrogate key generator stage to create a state file and then placed another transformer before the current transformer and used the NextSurrogateKey() in the transformer, as several transformers are running in parallel I couldn't implement this solution.
Also I tried using the below formula:
Code: Select all
(@INROWNUM - 1)*@NUMPARTITIONS+@PARTITIONNUM+1
Again due to parallelism of the transformer it is not working. I have found solutions here when there is just one transformer but not more than one.
Note: Once this job loads the target table, next job kicks off which also loads the same target table so the unique number should be continued from the previous job. Probably in the instance where we create a state file using the surrogate key stage should take care of this but want to mention it.
Any help would be appreciated.
Thanks.
Re: Assign a Unique key for the rows coming from the source
Posted: Wed Feb 25, 2015 10:06 pm
by SURA
The consistent way to do this by getting the value from any table is my experience. I used SK file and it got corrupted few times. I would prefer to do it via table.
Re: Assign a Unique key for the rows coming from the source
Posted: Wed Feb 25, 2015 11:48 pm
by UAUITSBI
Hi Sura,
Thanks for the reply. Problem here is I don't have any table to deal with where in I can use the power of RDMS I have to deal with sequential file unfortunately.
Posted: Wed Feb 25, 2015 11:48 pm
by chulett

Be aware that the images below the linked image could be considered
NSFW depending on what randomly shows up for you, so use caution. It's basically just six flat file sources, each with a transformer leading to a funnel.
Posted: Thu Feb 26, 2015 7:27 am
by qt_ky
I am curious why couldn't you assign a unique number immediately after the funnel stage?
Posted: Thu Feb 26, 2015 8:54 am
by UAUITSBI
Hi qt_ky,
The data belongs to a survey, so I am pivoting the data as soon I get it from the source, as I have another analytical stuff that I need to perform. The reason I am trying to assign a unique ID before pivoting inorder to track my source row, meaning:
Below is the survey record from the source:
Code: Select all
SURVEY SITE_NBR QUESTION RESPONSE1 RESPONSE2 RESPONSE3
ATT 1234 Satisfied? Yes No Average
ATT 1234 Satisfied? Yes Average No
I pivot the above rows to be in the below format:
Code: Select all
SURVEY SITE_NBR QUESTION RESPONSE1
SURVEY SITE_NBR QUESTION RESPONSE2
SURVEY SITE_NBR QUESTION RESPONSE3
So at this point I am loosing the track of original response, as at a given site different people can respond to a question either in a same way or different.
Code: Select all
ATT 1234 Satisfied? Yes
ATT 1234 Satisfied? No
ATT 1234 Satisfied? Average
ATT 1234 Satisfied? Yes
ATT 1234 Satisfied? Average
ATT 1234 Satisfied? No
So If I assign a unique number to the source response I will be able to do different analytics in further processing such as how many completed surveys we received, how many are incompletely so on and so forth...
Posted: Thu Feb 26, 2015 9:15 am
by UAUITSBI
I thought of couple of ways doing it a traditional way and "not so" traditional way any insight on these would be helpful:
Method 1:
In a new parallel job,
1. Get a RowNumber (Row Column property) from sequential file stage for each source file and name of the file (File Name property).
2. Funnel all the source files as at this point the metadata would be the same (FileName, RowNum).
3. Use Aggregator stage to get the Max RowNum for each source file.
4. Use Transformer, Stage variables -- create as many as source files.
5. In derivation for Stage variables, first stage variable will have max RowNum of first file. Second stage variable will be (First Stage variable + 1), Third would be (2nd stage variable + 1) so and so forth.
6. After this find a way to send values of this file to next job and use the RowNum of respective files in their transformers as an "Initial Value" for NextSurrogateKey().
I am not sure on how to implement the #6.
Method 2:
I know this is a dirty way of doing it but it will be quick:
1. In each Transformer assign a unique code say "T1" for first transformer, "T2" for second etc..
2. Concatenate the NextSurrogateKey() to each one of these"
Code: Select all
"T1":NextSurrogateKey()
"T2":NextSurrogateKey()
This will help in getting the uniqueness (unique ID) to the rows.
Posted: Thu Feb 26, 2015 4:21 pm
by SURA
In one of the project where i worked, they have decided to have an unique SK for the whole EDW (I am not sure why?) .
We have created a sequence in the DB2 and used in the SK stage across most of the jobs.