Page 1 of 1

ProfileStage

Posted: Fri Oct 14, 2005 9:51 am
by ccatania
I have a quick ProfileStage question concerning Cardinality count. Is there a limit on the number of rows displayed for this field?
I did a Column Analysis on a Master file of 532,037 rows; the key field is Item Code, which is unique on the mainframe.
The Analysis Result report shows for Cardinality Count 60,000 rows, the Uniqueness indicator is 100.
The results I expected would be to see a Cardinality count equal to the number of Rows in distribution. I verified my source and there are no duplicated occurrences.

:?: :?: :?:

Posted: Fri Oct 14, 2005 6:27 pm
by ray.wurlod
I don't know the answer to your question, but I agree with your expectation. Did you have a sample size set anywhere that might have limited the reported number?

Posted: Sun Oct 16, 2005 12:18 am
by roy
Hi,
Basicly ProfileStage uses a sample collected along the way to the later stages (I don't recall exactly which it is right now).
In the "Tools>Options>ProfileStage Options" menu click on the ProfileStage options tab and increase the DistributionAnalysisLimit from it's 60k default to accomodate the max (or higher) number of row you have in order to do the full heavy processing on all rows.

IHTH,

Posted: Tue Oct 18, 2005 12:27 pm
by ccatania
I found the setting under the Tool drop down menu as Roy indicated, the default setting was 60000. I decided to keep this default setting to not impact performance. If PS shows a 100% for the Uniqueness Indicator that is alright by me. Now that I know of this setting I feel confident that the result that PS returned is valid.

Thanks again for your assistance :D

Posted: Wed Oct 19, 2005 7:29 am
by roy
well there are sevaral options of sampling methods available and a person a bit familiar might decide to set a specific one that suites that case best.
Bare in mind that sampling can't garantee 100% so be prepared for an occassional extreeme case not covered.
I would reccomend notifying the ones making the decisions if your not one of them regarding this and let them decide.
some thimes they will want 100% coverage of rows knowing it will slow the entire process down and way heavy duty processig doing so.

IHTH,