If I have 3 records for the same business key in the incoming source and I have APT_ORAUPSERT_COMMIT_ROW_INTERVAL of 5000, how would the SCD stage handle such a situation? Example:
BkCol ValueCol1 ValueCol2
1 A Ab
2 C Ba
2 C Ca
2 C Cb
The first record should be an insert and the rest should be updates. Will this be handled automatically(I don't think so but would like to hear your opinions). Two questions:
1)Setting APT_ORAUPSERT_COMMIT_ROW_INTERVAL to 1 would make insertion/updation extremely slow.
How would you handle it?
2)How would you handle the multi-node situation?
Slowly changing dimension stage and commit interval
Moderators: chulett, rschirm, roy
All of the same keys have to be on one node. So you are force to sort by the key and keep it in change date order otherwise you final result not accurate. You still have to commit one row at a time.
The only other solution is to remove dupes. Bring in only one at a time. This means you have to make multiple passes by you can commit 5000 rows at a time.
The only other solution is to remove dupes. Bring in only one at a time. This means you have to make multiple passes by you can commit 5000 rows at a time.
Mamu Kim
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Re: Slowly changing dimension stage and commit interval
Curiously this is not always the case.abc123 wrote:Setting APT_ORAUPSERT_COMMIT_ROW_INTERVAL to 1 would make insertion/updation extremely slow.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Kim, I am doing the ORDER BY in the source SQL. I had hash partitioning on the Input tab of the next stage but I removed it for performance reasons. Would doing the ORDER BY put all same keys in the same node without specifying hash partitioning? I wouldn't think so.
Ray, that's an interesting piece of information. What do you think the best peformance percentage difference might be, between 5000 commits and 1 commit? Just curious.
Ray, that's an interesting piece of information. What do you think the best peformance percentage difference might be, between 5000 commits and 1 commit? Just curious.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
In an Upsert scenario, each individual update may take some considerable time (in database terms). If you send 5000 before you commit, that wait time is multiplied and seen as a performance degradation (for example because rows/sec goes down - the clock keeps running but no more rows are being processed while waiting).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.