Reference Match clarification

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
srds2
Premium Member
Premium Member
Posts: 66
Joined: Tue Nov 29, 2011 6:56 pm

Reference Match clarification

Post by srds2 »

Hi, I am using Reference Match (May to One Multiple) and having the same issue with my CHAR comparison. Can anyone help me to understand how this special handling works

when we have a column x being used as a "Blocking"column in match pass 1 and the same column is being used as "Match Command" in another match pass (Variable Special handling is applicable for only matching columns or both matching and blocking column?)

My requirement is If both the columns are populated then the value must be same to be a match, If either one of the columns is missing then also it should be a match so can I use this "Critical Missing OK" special handling for both Data and Reference Columns?

Thanks a lot in advance for your help.
rjdickson
Participant
Posts: 378
Joined: Mon Jun 16, 2003 5:28 am
Location: Chicago, USA
Contact:

Post by rjdickson »

Hi,

The 'Critical Missing Ok' only applies to the matching commands, not to the blocking. In your case, the records would not end up in the same block because one of them is blank. Do not block on the column, and only use a match command with a 'Critical Missing OK'.
Regards,
Robert
srds2
Premium Member
Premium Member
Posts: 66
Joined: Tue Nov 29, 2011 6:56 pm

Post by srds2 »

Thanks Robert for the information.

If I set Special Handling Action as "Critical Missing Ok" then It would be matched even if the value is missing on either Data/Reference side correct? (Just want to confirm whether it applies for only Data columns/Reference Column or Both because when I select this Action under Special handling Actions, besides it shows that "Data Columns" Radio Button as selected. when I select Reference columns Radio Button I dont see these Missing Actions but only NOFREQ and CONCAT)

Thank you very much for your hep.
rjdickson
Participant
Posts: 378
Joined: Mon Jun 16, 2003 5:28 am
Location: Chicago, USA
Contact:

Post by rjdickson »

That is correct. You select the 'Data' side only, and the 'missing ok' context is either data or reference (either one or both can be missing).

For example:

Code: Select all

DatCol RefCol Result
------ ------ -------------
ABC    ABC     Matched
ABC    ABD     NOT Matched
ABC            Matched
       ABC     Matched
               Matched
I hope this helps.
Regards,
Robert
srds2
Premium Member
Premium Member
Posts: 66
Joined: Tue Nov 29, 2011 6:56 pm

Post by srds2 »

Thanks a lot Robert for the information. It really helped to understand how this special handling works.

One last question regarding this match comparisons: I am using NAME_UNCERT and CHAR comparisions in my match passes, I was under the impression that by default (With no extra special handling)CHAR and UNCERT functions would work the same way(Blank to value and Blank to Blank - Positive match). Can you please let me know if my assumption is wrong as per the below example using the default comparisions with no special handling.

Code: Select all

DatCol RefCol Result 
------ ------ ------------- 
ABC    ABC     Matched 
ABC    ABD     NOT Matched 
ABC            Matched 
       ABC     Matched 
               Matched 

If these comparisions doesnt match on Blank to value or Blank to Blank then as per my requirement do I need to add all the columns which I am using for match comparisions CHAR and NAME_UNCERT under Special Handling to have "Critical Missing OK"?

and does the order of match commands important to get proper matches. I know we need to add Blocking columns as per their priority. Can you please let me know if that rule applies for Match commands as well?

Thanks for your help in understanding these concepts to resolve my issue.
rjdickson
Participant
Posts: 378
Joined: Mon Jun 16, 2003 5:28 am
Location: Chicago, USA
Contact:

Post by rjdickson »

Hi,

Match Commands (any of them) do not work the same way. This is by design. By default, anything compared to a space receives a 0 score because it is neither a match nor a mismatch - it is missing data. This is not the same as forcing a 'Not Matched' condition.

Critical Missing OK
Let's assume you have 15 columns that you are using to match. That's 15 different match commands. Let's further assume that 'DateOfBirth' is one of those columns, and is different in two records. Finally, assume that the other 14 columns are exact matches. Only the DateOfBirth is different. Your business rules may be strict enough to say 'NEVER match when there is a different date of birth.' In this case, 'Critical Missing OK' can be used to keep the records apart even though there are 14 other columns that match exactly. The input and result would look something like this:

Code: Select all

DatCol   RefCol   Result 
-------- -------- ------------- 
19910528 19910528 Matched 
19910528 19910529 NOT Matched 
19910528          Matched 
         19910528 Matched 
                  Matched
Match Commands
Contrast with the result from using a Match Command. Remember, this is only one of the 15 columns you have selected for the match.
Let's assume you have used CHAR:

Code: Select all

DatCol   RefCol   Resuling weight for this column
-------- -------- -------------------------------
19910528 19910528 Maximum agreement weight  
19910528 19910529 Maximum disagreement weight
19910528          Zero
         19910528 Zero
                  Zero
So, this one field being different will not keep the records from matching - remember that there are 14 other fields that match exactly (in our example).

Remember, this can all be modified with overrides.

I hope this helps.
Regards,
Robert
rjdickson
Participant
Posts: 378
Joined: Mon Jun 16, 2003 5:28 am
Location: Chicago, USA
Contact:

Post by rjdickson »

Also, to answer your other questions:
No, blocking order within a pass does not matter.
No, match command order does not matter either. Comparison to the cutoffs does not happen until all weights are computed.
Regards,
Robert
srds2
Premium Member
Premium Member
Posts: 66
Joined: Tue Nov 29, 2011 6:56 pm

Post by srds2 »

Thanks a lot Robert for the information.

So, we can either add those match commands in the specail hadling with "Critical Missing OK" or Add "Weight Overrides" to match Blank values correct? If yes then I would use Special handling rather than doing weight overrides which I felt a bit complex to work with.

I was trying to test how this Weight overrides work for my requirement (Matching Balnks) but couldn't understand how to determine the Agreement Weight/Missing Weight override values. I have gone through few posts on DsXchange and other documentation, based on my understanding this missing weight override value we specify should be greater than the Match cutoff value in order to get the Blanks matched. does it mean we need to first specify the match cut off value before doing weight overrides?

Can you please suggest which method is better for my requirement (Matching blank to Value and Blank to blank)
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Not sure how I missed this... well past time to get this into your own post.
-craig

"You can never have too many knives" -- Logan Nine Fingers
srds2
Premium Member
Premium Member
Posts: 66
Joined: Tue Nov 29, 2011 6:56 pm

Post by srds2 »

Thanks a lot Robert for the information.

So, we can either add those match commands in the specail hadling with "Critical Missing OK" or Add "Weight Overrides" to match Blank values correct? If yes then I would use Special handling rather than doing weight overrides which I felt a bit complex to work with.

I was trying to test how this Weight overrides work for my requirement (Matching Balnks) but couldn't understand how to determine the Agreement Weight/Missing Weight override values. I have gone through few posts on DsXchange and other documentation, based on my understanding this missing weight override value we specify should be greater than the Match cutoff value in order to get the Blanks matched. does it mean we need to first specify the match cut off value before doing weight overrides?

Can you please suggest which method is better for my requirement (Matching blank to Value and Blank to blank)

Thank you very much for sharing information.
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Without looking at your data and the rest of your match strategy, it's hard to say definitively. Probabilistic matching is not a paint-by-numbers exercise (and why it's so much fun ;-).

Critical Missing OK doesn't mean it automatically matches if it is blank. What it says is "If this field doesn't match, then throw out the whole record, regardless of how high it scores from the other fields in the match pass. However, if it is missing, then don't automatically throw the record out. Still give it a chance to match based on the other fields"

This is not something you would always use. If the data is highly trusted and vital to the overall match then you would, but normally you would just use the disagreement weight as a penalty, or add a weight override if you want a heavier penalty.

Remember, the overall score is the sum of all of the individual field scores. If that field is missing and the record is still good enough to get an overall match then it will. If it isn't and you want it to, then you need to fiddle with the scores.


Using Weight Overrides is about giving things a nudge here and there to put the records you want over the threshold and keeping the ones you don't want under.

Raising or lowering the threshold makes it harder or easier overall.

As a very rough guide:
If you're being too picky overall, lower the threshold.
If you want a record to get through even though a specific field is missing (while not being more lenient in general), use a weight override to give it a bit of a boost.

Which option you choose depends on the behaviour you want, how trustworthy the sources are, the other fields in your match pass and what scores they are contributing to the overall match score.
srds2
Premium Member
Premium Member
Posts: 66
Joined: Tue Nov 29, 2011 6:56 pm

Post by srds2 »

Thanks a lot for the information.

Can you please share me some suggestions on how can I know that what should be the Cut Off value and how much I should increase/decrease the Agreement/Disagreement weights in order to match on blanks.

Any information to better understand and modify these values would be helpful.

Thank you very much for your help
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Sure. Can I have your data? ;-)

It's all related to the data, the number if fields and how the scores are clumped. A threshold for 1 pass could be 20 and 50 for another ass with more match fields. It's on a case by case basis and may even change over time as your data does.
Therein lies the art.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Using the Match Designer test utility you can view a histogram of matches and slide the cutoffs along to investigate their effect. In association with doing that, you need to identify row pairs in which the aggregate weight is at or near the threshold, and make a decision about whether the lower-weighted row should be considered to be a match or not.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply