DSXchange: DataStage and IBM Websphere Data Integration Forum
View next topic
View previous topic
Add To Favorites
Author Message
kaps
Participant



Joined: 10 May 2005
Posts: 444

Points: 4027

Post Posted: Wed Apr 25, 2012 6:41 pm Reply with quote    Back to top    

DataStage® Release: 8x
Job Type: Parallel
OS: Unix
I am wondering what is the best way to deal with Hybrid SCD types in a single table in Datastage 8.7 ? When I say hybrid SCD types, I meant, I don't need to create a new version of a record for any column in a table changes rather only if certain column changes otherwise I just need to update the record.

I am thinking of joining the current data with previous day's data using a Join stage and then check for Type2 column changes by comparing each and every Type2 columns in stage variables but it seems like a cumbersome process if there are lot of columns involved and it makes it more difficult if the columns are nullable...

I don't think we can use Change Capture Stage as it does not tell us what column is changing...Is there a better approach to this ?

Just FYI...We are new to 8.7

Thanks for yout time..
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 51774
Location: Sydney, Australia
Points: 281116

Post Posted: Wed Apr 25, 2012 8:22 pm Reply with quote    Back to top    

Same answer as in 8.5. In the SCD stage some columns can be identified for Type 1 changes and other columns can be identified for Type 2 changes, even in the same data set.

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
currently hiring: Sydney and Melbourne
Rate this response:  
Not yet rated
kaps
Participant



Joined: 10 May 2005
Posts: 444

Points: 4027

Post Posted: Wed Apr 25, 2012 9:01 pm Reply with quote    Back to top    

Ray,
Thanks for the reply. We have moved from 7.5 to 8.7 so need to play with SCD stage. Never used it before...I will try that...
Rate this response:  
Not yet rated
kaps
Participant



Joined: 10 May 2005
Posts: 444

Points: 4027

Post Posted: Mon Apr 30, 2012 4:36 pm Reply with quote    Back to top    

I am configuring SCD stage in 8.7. Job does not fail but it does not give me the expected result. For testing purpose, I have changed a Type-2 column and ran the job which only generated 1 record in the DimUpdate link which is Insert record. It did not create the record for corresponding update(to close the existing record).

Job Design is :

Sequential File, SCD Stage with lookup as DB2 connector, DimUpdate seq file and FactUpdate seq file as two outputs.In SCD stage, Lookup tab, did set up the keys for Join. Whatever the fields I am joining on I defined as Business keys, defined other columns as Type1, Type2, Blank appropriately...

I hope I am clear with my question. What am I missing here ?

Thanks for your time...
Rate this response:  
Not yet rated
SURA



Group memberships:
Premium Members

Joined: 14 Jul 2007
Posts: 1067

Points: 7666

Post Posted: Mon Apr 30, 2012 5:06 pm Reply with quote    Back to top    

Hope this link may help you.


http://www.dsxchange.com/viewtopic.php?t=142965&highlight=SCd

_________________
Thanks
DS User
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
Rate this response:  
Not yet rated
kaps
Participant



Joined: 10 May 2005
Posts: 444

Points: 4027

Post Posted: Mon Apr 30, 2012 6:28 pm Reply with quote    Back to top    

Sura - I got it...Seems like I have interchanged the links...

Another question is now we have to UPSERT into the Dim table. Correct ?
But one of the best practice says that DO NOT DO upserts as it has to check for every record to make sure if it's Insert or Update. So how does it affect the performance ?

Can we split the file into two using curr_indic column and do insert and updates ?
Rate this response:  
Not yet rated
SURA



Group memberships:
Premium Members

Joined: 14 Jul 2007
Posts: 1067

Points: 7666

Post Posted: Mon Apr 30, 2012 8:30 pm Reply with quote    Back to top    

It is going to be Insert / Upsert is Business decision and i can't say YES / No.

In regards with best practice, i don't find much difference especially in this project where i am working.

Before we start using the SCD stage i did some test.

1. Using JOIN --> TFM and then 2 files for insert and update as a separate load.

2. Change Capture using UPSERT as well as insert and update as a separate load.

3. SCD UPSERT as well as insert and update as a separate load.

100,000 records for test having 10 - 12 columns, but i can't remember.

I can't find much difference in relates with performance.

Notes: No much difference I mean in one approach say it is completing in 5 min whereas in other it is taking either 20 min / 2 min.

Again it dependes PEOPLE vs PROJECT.

Decission is yours and this is my view.

_________________
Thanks
DS User
----------------------------------
Revealing your ignorance is fine, because you get a chance to learn.
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 51774
Location: Sydney, Australia
Points: 281116

Post Posted: Mon Apr 30, 2012 8:32 pm Reply with quote    Back to top    

To work properly the SCD stage needs, in theory, to get the current row in to the dimension table before the next row is processed. Stay with upsert, array size 1 and auto-commit.

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
currently hiring: Sydney and Melbourne
Rate this response:  
Not yet rated
kaps
Participant



Joined: 10 May 2005
Posts: 444

Points: 4027

Post Posted: Tue May 01, 2012 9:30 am Reply with quote    Back to top    

Ray - Is not what you are describing is true If I Insert directly into the Dimension table in the same job from SCD stage ? What difference does it make if I just load them to the file and do bulk insert and update as I have to update all the history record with Type1 values not just the last one.

Please correct me If I am wrong...
Rate this response:  
Not yet rated
ray.wurlod

Premium Poster
Participant

Group memberships:
Premium Members, Inner Circle, Australia Usergroup, Server to Parallel Transition Group

Joined: 23 Oct 2002
Posts: 51774
Location: Sydney, Australia
Points: 281116

Post Posted: Tue May 01, 2012 4:13 pm Reply with quote    Back to top    

Under your scenario how would the SCD stage be able to look up a business key that occurs a second time in the stream input? (In practice the dimension table is, at least partially, memory-mapped ...

_________________
RXP Services Ltd
Melbourne | Canberra | Sydney | Hong Kong | Hobart | Brisbane
currently hiring: Sydney and Melbourne
Rate this response:  
Not yet rated
kaps
Participant



Joined: 10 May 2005
Posts: 444

Points: 4027

Post Posted: Wed May 02, 2012 9:57 am Reply with quote    Back to top    

Yes. You are correct but in my case it's a batch process and we won't get two records for same business key in a load as we extract from OLTP systems at certain time in a day. Does it make sense ?
Rate this response:  
Not yet rated
Display posts from previous:       

Add To Favorites
View next topic
View previous topic
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum



Powered by phpBB © 2001, 2002 phpBB Group
Theme & Graphics by Daz :: Portal by Smartor
All times are GMT - 6 Hours