Page 1 of 1

compare address columns and insert records

Posted: Wed Jun 13, 2018 8:12 am
by satheesh_color
Hi All,

I have the below scenario, I have to build(insert) the address dimension as per the below scenario

1. Compare values(Addr1,City,State,Country,PostalCode) against existing customer address dimension(Addr1,City,State,Country,PostalCode), no key columns. if it is not available then insert. I can do this by using lookup stage with the help of isnull function to segeregate records.

2. Compare(Addr1,City,State,Country,PostalCode) against existing dimension, if any of the values has been changed(Meaning: if city is null for the first order, and for another order city is having values), we again need to insert(treat it as an new record) that into the customer address dimension.

Kindly let me knw your thoughts on how to achieve these scenarios, especially 2.

Posted: Wed Jun 13, 2018 8:35 am
by chulett
So... actually a Type 2 dimension? There's a stage for that.

Posted: Wed Jun 13, 2018 2:11 pm
by rameshrr3
more specifically use a SCD stage or Change Capture .

Posted: Thu Jun 14, 2018 1:54 am
by satheesh_color
Hi Craig,

It is not really SCD Type2. Always we will need to insert a new row to treat it as new record eventhough some columns has been updated as we are doing incremental load.

Eg1:Yesterday record: Insert the below row into Cust_Addr_Dim as it has not the below value in dimension

Customer,Order_id,AddrLine1,AddrLine2,City,State,Country,Postalcode
1,22,24 Dickson Rd,Null,Houston,TX,USA,71104

Eg2:CurrentDay's Reocrd(Incremental Data)

Customer,Order_id,AddrLine1,AddrLine2,City,State,Country,Postalcode
1,33,24 Dickson Rd,Bulevd Street,Houston,TX,USA,71104

We will need to treat this as a new record as it has different order from the same customer with the AddrLine2 has values.

I am doing CDC against Cust_Addr_DIM(as it contains only address columns, no key columns in that dimension(customer,order_id etc..).

All i have compared with existing values is (AddrLine1,AddrLine2,City,State,Country,PostalCode), from here i am struggling with Eg2 scenario.

Note: If the currentDay(Eg2)(Incremental Data) matches with the existing values(AddrLine1,AddrLine2,City,State,Country,PostalCode)), do nothing else insert.

Thanks,
S.R

Posted: Thu Jun 14, 2018 6:29 am
by chulett
Okay. You'd need to expand on the specifics of what issues / struggles you are having as it sounds fairly straight-forward to me. But I'm sure I'm missing something about the way your data is architected...

Posted: Fri Jun 15, 2018 12:27 pm
by rameshrr3
Its still remains as chulett says , a special case similar to SCD2, maybe you should just enable change capture stage for Updates & Inserts and run both that data[sets] as an Insert. Drop Copies ( same addresses) and Deletes ( No records in Input , but exist as reference) . MAke sure Not to compare Nulls ( Handle nulls as Empty strings Or 0 ( integer cols) before comparing ) on both ends. Your key for address should be composite of all Value columns needed for that address.

Posted: Wed Jun 20, 2018 10:10 am
by chulett
Did you work this out?

Posted: Thu Jun 21, 2018 3:36 am
by satheesh_color
Hi Craig / Ramesh,


Yes..I have used CDC stage to get that done. Thanks for your valuable inputs.


Note: Took a week off..and haven't used my laptop for almost a week. :D




Regards,
S.R