Page 1 of 1

US Address Standardization

Posted: Mon Jun 11, 2018 2:36 am
by satheesh_color
Hi All,

I am new to Quality stage. I have a rquirement to validate US adresses(NA) for the below values.
Addr_Line1
Addr_Line2
City
State
Country
Zip/PostalCode

For that i am creating new ruleset(AdrValidate) with the below values

From PAT File:
\PRAGMA_START
SEPLIST " ~"
STRIPLIST " `!@#$%^*()_+={}[]|\\;\"<>?"
\PRAGMA_END

CLS File:
;-------------------------------------------------------------------------------
; Table Sort Order: 51-51 Ascending, 26-50 Ascending, 1-25 Ascending
;-------------------------------------------------------------------------------
AVENUE AVE A
AVE. AVE A
AVE, AVE A
ALLEY ALY A
ALY. ALY A
ALY, ALY A
CIRCLE CIR A
CIR. CIR A
CIR, CIR A
COURT CT A
CT. CT A
CT, CT A
COVE CV A
CV. CV A
CV, CV A
DRIVE DR A
DR. DR A
DR, DR A
FREEWAY FWY A
FWY. FWY A
FWY, FWY A
HIGHWAY HWY A
HWY. HWY A
HWY, HWY A
LANE LN A
LN. LN A
LN, LN A
PARKWAY PKWY A
PKWY. PKWY A
PKWY, PKWY A
PLACE PL A
PL. PL A
PL, PL A
STREET ST A
ST. ST A
ST, ST A
ATTENTION ATTN A
ATTN. ATTN A
ATTN, ATTN A
SUITE STE A
STE. STE A
STE, STE A
BOULEVARD BLVD A
BLVD. BLVD A
BLVD, BLVD A
ROAD RD A
RD. RD A
RD, RD A

And i am using the below in Transformer stage to get the required values
Trim(Ereplace(Ereplace(Ereplace(Ereplace(Ereplace(Ereplace(ln_std_adr.UnhandledData_AdrValidate,'C/O',''),'C/ O',''),'C /O',''),'C\O',''),'C\ O',''),'C \O',''))
as SVAddrValidate

Field(SVAddrValidate,' ~ ',1) =SVAddr_Line1
Field(Field(SVAddrValidate,' ~ ',2),' ',1) = SVAddr_Line2
Field(Field(SVAddrValidate,' ~ ',3),' ',1) =SCCity
Field(SVAddrValidate,' ~ ',4)=SVState
Field(SVAddrValidate,' ~ ',5)=SVCountry
Field(SVAddrValidate,' ~ ',6)=SVZip

SourceData:

1 DUNDAS STREET WEST, SUITE 2500=Addr_Line1
INTERWIND CORP.=Addr_Line2
TORONTO=City
ON=State
Country=CAN
M5G 1Z3=Zip

After doing the above standardization, i am getting the value like below
1 DUNDAS STREET WEST=AddrLine1
SUITE 2500=Addr_Line2(part of Addr_Line1 is overwritten to Addr_Line2)
Null=City
Country=TORONTO
State=CAN
ZIP=ON,M5G 1Z3

Kindly let me know how to overcome this issue, as i am new to this stage i am still struggling.

It could be useful on how to stip( ---(more than one dash), (more than one space), more than one comma...

Thanks & Regards,
S.R

Posted: Mon Jun 11, 2018 7:21 am
by satheesh_color
Hi All,

After the standardize process, i got the below output
LiterL=~(Seperation for records)
Addr_Line1,Addr_Line2,City,state,Country,Zip
1 BOSTON PLACE,41ST FLOOR ~ ~ BOSTON ~ MA ~ USA ~ 02108

From the above example, while i try to get the required records i failed as Addr_Line2 is NULL, so 41ST FLOOR goes to Addr_Line2.
Field(SVAddrValidate,' ~ ',1) =SVAddr_Line1
Field(SVAddrValidate,' ~ ',2) =SVAddr_Line2
Field(SVAddrValidate,' ~ ',3) =City
Field(SVAddrValidate,' ~ ',4)=SVState
Field(SVAddrValidate,' ~ ',5)=SVCountry
Field(SVAddrValidate,' ~ ',6)=SVZip


Let me know your thoughts on the same

Posted: Mon Jun 11, 2018 3:05 pm
by ray.wurlod
What's wrong with using the out-of-the-box USADDR and USAREA rule sets?

Posted: Tue Jun 12, 2018 2:42 am
by satheesh_color
Hi Ray,

Thanks.

When i try to use USAREA
ln_std_adr.CityName_USAREA has all the values.

ln_std_adr.StateAbbreviation_USAREA is set as null and so on.
Eg: HOUSTON TX USA(ln_std_adr.CityName_USAREA)

Eg2 for Canada: SAINT BRUNO QC CAN (ln_std_adr.CityName_USAREA)
ln_std_adr.StateAbbreviation_USAREA,ln_std_adr.CountryCode_USAREA,ln_std_adr.ZipCode_USAREA set as null..

I am sorry, still i am struggling with that.

Regards,
S.R

Posted: Tue Jun 12, 2018 9:18 pm
by ray.wurlod
I am unable to reproduce this symptom. Have you tried "training" QualityStage with some overrides (e.g. HOUSTON is a city name)?