Processing multiple business key changes in a Type2 SCD
Posted: Sat Sep 04, 2010 9:19 am
Looking for some conceptual help on a particular Type2 SCD issue, regardless of the tool used. Technically, I have this with Informatica right now but it's not all that different and looking more for methodology than implementation specifics right now. Besides, the peoples are way more smarter here.
Specifically, I have a job/mapping that does everything just fine - insert new records, insert/update for changes, logical deletes, the whole nine yards - as long as there is only one change per business key to be processed in any given run. When I stack up several change cycles in my source, things don't quite work so well. Design (abstracted) looks like this:
The lookup is dynamic and determines the type of change for me, the router splits appropriate information off to further pipelines. The targets are all Oracle doing plain old sql inserts or updates, nothing fancy and all four hook to the same table, the dimension target. That last bit is the rub, it seems.
Much like the OE stages, those targets all create separate connections / sessions into Oracle, so separate transactions that don't have a clue what the others are doing. In Server I would use a single target stage with 4 links and be done, here (like PX) that's not possible.
This all works fine for singleton changes. Stack up multiple changes for the same business key and... workie not so much. Updates of records that may or may not have been inserted yet are not reliable. With the difference in metadata between the inserts and updates, I can't really funnel things together, never mind the fact that there's no such concept with Infa. So, how are people handling this?
I know I could land the updates and only complete processing for the inserts here and then build a separate mapping to handle the updates after all of the inserts have been committed. I'm looking for a way to do this all at once, if possible. We've talked about 'folding' the updates and inserts together, if that makes sense. Meaning, only send inserts to the database but make them look like they would after the in-flight change data had been applied - two logical operations, one physical operation. Just not quite sure how to manage all that and cover 100% of the scenarios we could run into over time.
They have variable ports just like stage variables, but no constraint concept. For that, I would need to filter post-transformation. Either that or aggregate for the 'last' record per business key?
For some reason, this really isn't coming together for me, long days with short nights and paint fumes seem to be wreaking havoc on my cognitive abilities at the moment... thus the reach out to the forum.
Any thoughts on this would be most appreciated.
Specifically, I have a job/mapping that does everything just fine - insert new records, insert/update for changes, logical deletes, the whole nine yards - as long as there is only one change per business key to be processed in any given run. When I stack up several change cycles in my source, things don't quite work so well. Design (abstracted) looks like this:
Code: Select all
source -> lookup -> router +----- new record insert ----------> target
+----change +-> insert new change -> target
+-> update old record -> target
+----logical delete update --------> targetMuch like the OE stages, those targets all create separate connections / sessions into Oracle, so separate transactions that don't have a clue what the others are doing. In Server I would use a single target stage with 4 links and be done, here (like PX) that's not possible.
This all works fine for singleton changes. Stack up multiple changes for the same business key and... workie not so much. Updates of records that may or may not have been inserted yet are not reliable. With the difference in metadata between the inserts and updates, I can't really funnel things together, never mind the fact that there's no such concept with Infa. So, how are people handling this?
I know I could land the updates and only complete processing for the inserts here and then build a separate mapping to handle the updates after all of the inserts have been committed. I'm looking for a way to do this all at once, if possible. We've talked about 'folding' the updates and inserts together, if that makes sense. Meaning, only send inserts to the database but make them look like they would after the in-flight change data had been applied - two logical operations, one physical operation. Just not quite sure how to manage all that and cover 100% of the scenarios we could run into over time.
They have variable ports just like stage variables, but no constraint concept. For that, I would need to filter post-transformation. Either that or aggregate for the 'last' record per business key?
For some reason, this really isn't coming together for me, long days with short nights and paint fumes seem to be wreaking havoc on my cognitive abilities at the moment... thus the reach out to the forum.
Any thoughts on this would be most appreciated.