Hybrid SCD Type handling...
Moderators: chulett, rschirm, roy
Hybrid SCD Type handling...
I am wondering what is the best way to deal with Hybrid SCD types in a single table in Datastage 8.7 ? When I say hybrid SCD types, I meant, I don't need to create a new version of a record for any column in a table changes rather only if certain column changes otherwise I just need to update the record.
I am thinking of joining the current data with previous day's data using a Join stage and then check for Type2 column changes by comparing each and every Type2 columns in stage variables but it seems like a cumbersome process if there are lot of columns involved and it makes it more difficult if the columns are nullable...
I don't think we can use Change Capture Stage as it does not tell us what column is changing...Is there a better approach to this ?
Just FYI...We are new to 8.7
Thanks for yout time..
I am thinking of joining the current data with previous day's data using a Join stage and then check for Type2 column changes by comparing each and every Type2 columns in stage variables but it seems like a cumbersome process if there are lot of columns involved and it makes it more difficult if the columns are nullable...
I don't think we can use Change Capture Stage as it does not tell us what column is changing...Is there a better approach to this ?
Just FYI...We are new to 8.7
Thanks for yout time..
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Same answer as in 8.5. In the SCD stage some columns can be identified for Type 1 changes and other columns can be identified for Type 2 changes, even in the same data set.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I am configuring SCD stage in 8.7. Job does not fail but it does not give me the expected result. For testing purpose, I have changed a Type-2 column and ran the job which only generated 1 record in the DimUpdate link which is Insert record. It did not create the record for corresponding update(to close the existing record).
Job Design is :
Sequential File, SCD Stage with lookup as DB2 connector, DimUpdate seq file and FactUpdate seq file as two outputs.In SCD stage, Lookup tab, did set up the keys for Join. Whatever the fields I am joining on I defined as Business keys, defined other columns as Type1, Type2, Blank appropriately...
I hope I am clear with my question. What am I missing here ?
Thanks for your time...
Job Design is :
Sequential File, SCD Stage with lookup as DB2 connector, DimUpdate seq file and FactUpdate seq file as two outputs.In SCD stage, Lookup tab, did set up the keys for Join. Whatever the fields I am joining on I defined as Business keys, defined other columns as Type1, Type2, Blank appropriately...
I hope I am clear with my question. What am I missing here ?
Thanks for your time...
Sura - I got it...Seems like I have interchanged the links...
Another question is now we have to UPSERT into the Dim table. Correct ?
But one of the best practice says that DO NOT DO upserts as it has to check for every record to make sure if it's Insert or Update. So how does it affect the performance ?
Can we split the file into two using curr_indic column and do insert and updates ?
Another question is now we have to UPSERT into the Dim table. Correct ?
But one of the best practice says that DO NOT DO upserts as it has to check for every record to make sure if it's Insert or Update. So how does it affect the performance ?
Can we split the file into two using curr_indic column and do insert and updates ?
It is going to be Insert / Upsert is Business decision and i can't say YES / No.
In regards with best practice, i don't find much difference especially in this project where i am working.
Before we start using the SCD stage i did some test.
1. Using JOIN --> TFM and then 2 files for insert and update as a separate load.
2. Change Capture using UPSERT as well as insert and update as a separate load.
3. SCD UPSERT as well as insert and update as a separate load.
100,000 records for test having 10 - 12 columns, but i can't remember.
I can't find much difference in relates with performance.
Notes: No much difference I mean in one approach say it is completing in 5 min whereas in other it is taking either 20 min / 2 min.
Again it dependes PEOPLE vs PROJECT.
Decission is yours and this is my view.
In regards with best practice, i don't find much difference especially in this project where i am working.
Before we start using the SCD stage i did some test.
1. Using JOIN --> TFM and then 2 files for insert and update as a separate load.
2. Change Capture using UPSERT as well as insert and update as a separate load.
3. SCD UPSERT as well as insert and update as a separate load.
100,000 records for test having 10 - 12 columns, but i can't remember.
I can't find much difference in relates with performance.
Notes: No much difference I mean in one approach say it is completing in 5 min whereas in other it is taking either 20 min / 2 min.
Again it dependes PEOPLE vs PROJECT.
Decission is yours and this is my view.
Thanks
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
Ram
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
To work properly the SCD stage needs, in theory, to get the current row in to the dimension table before the next row is processed. Stay with upsert, array size 1 and auto-commit.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ray - Is not what you are describing is true If I Insert directly into the Dimension table in the same job from SCD stage ? What difference does it make if I just load them to the file and do bulk insert and update as I have to update all the history record with Type1 values not just the last one.
Please correct me If I am wrong...
Please correct me If I am wrong...
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Under your scenario how would the SCD stage be able to look up a business key that occurs a second time in the stream input?
(In practice the dimension table is, at least partially, memory-mapped.)
(In practice the dimension table is, at least partially, memory-mapped.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.