Review column analysis results in 'domain & Completeness

This forum contains ProfileStage posts and now focuses at newer versions Infosphere Information Analyzer.

Moderators: chulett, rschirm

Post Reply
akonda
Participant
Posts: 97
Joined: Wed Feb 28, 2007 6:15 am

Review column analysis results in 'domain & Completeness

Post by akonda »

Hello,

I've done column Analysis using information analyzer. Now I'm trying to review the analysis in Frequency distribution -> 'domain & Completeness'. since I have millions records, its hard to go thru every entry. Is there a way to write a condition for each column and get the review done. like if the City name contains numeric values make the status invalid else valid.

Example:

Atlanta- Valid
Atalata12345- Invalid
atla- default
arun
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

Under domain you can set valid ranges or use a list of valid results (ie for AU you could load in the Aus Post locality DB), and for format you can mark invalid patterns, but you can't use custom rules in here. Would be a handy thing though...
akonda
Participant
Posts: 97
Joined: Wed Feb 28, 2007 6:15 am

Post by akonda »

Thanks for your reply.

Is it possible to change the status for multiple columns at a time. ?

Apperently, not possible but just want to make sure that I m not missing the fecility in the tool.
arun
stuartjvnorton
Participant
Posts: 527
Joined: Thu Apr 19, 2007 1:25 am
Location: Melbourne

Post by stuartjvnorton »

No you can't and I'm not sure why you'd want to.

Sure it would be quicker, but it only makes sense if you don't plan on reviewing the results you just produced. Which doesn't make sense.
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

Have a look at DQ rules in Information Analyzer - I would recommend version 8.7 rollup 1. This lets you create the type of DQ rules you are looking for where the rule is defined as a multi criteria statement and it produces data quality metrics. For data that has millions of rows you will find DQ rules more effective than manual data checking. You can then bind these city rules to different instances of city columns.

Have a look at this article on pre-built address rules:
Using pre-built rule definitions with IBM InfoSphere Information Analyzer

You will also find QualityStage more suitable to cleansing these fields - so standardising Atalata into Atlanta.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

There are some consoles coming along to aid with managing these rules.

Can't tell you when, but they're in development now.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply