Hi all,
i am able to run my standardization jobs using QS plug in for DS.
now i have to run my Match job using the plug-in.
the problem is i canoot use both the files as input to the QS plug in as it does not support reference links.
and the report and extract files do not contain any metadata definitions. so how do i proceed with using the plug in. if anyone has used the plug in
in a similar case please guide me thru the process.
thanks.
Running a mtach job using QS plug in
-
- Participant
- Posts: 3593
- Joined: Thu Jan 23, 2003 5:25 pm
- Location: Australia, Melbourne
- Contact:
The QualityStage plugin does support reference inputs however you have to load one of your files into a hash file first. That way DataStage feeds the matching lookup rows to the QualityStage job via hash file lookups.
You can also feed in just the primary data and let QualityStage open the match data file directly from a flat file and not from a DataStage link.
DataStage will receive the output of the matching, you need to define the fields from the input data that will end up in the match report and manually add columns for things like match weight and match result. I don't have access to the exact columns to be added but you can see them in the QualityStage application when you go through the matching steps.
You can also feed in just the primary data and let QualityStage open the match data file directly from a flat file and not from a DataStage link.
DataStage will receive the output of the matching, you need to define the fields from the input data that will end up in the match report and manually add columns for things like match weight and match result. I don't have access to the exact columns to be added but you can see them in the QualityStage application when you go through the matching steps.
Certus Solutions
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Blog: Tooling Around in the InfoSphere
Twitter: @vmcburney
LinkedIn:Vincent McBurney LinkedIn
Another way to do this is to combine the data and reference files together with some kind of flag and let QS split the input file into two. (D for data and R for reference for instance). I find this approach much quicker and less of a maintenance nightmare.
The match result extract file will not be that much of a problem though, as it is most likely a single file which gets fed into the output link from QS plugin.
The match result extract file will not be that much of a problem though, as it is most likely a single file which gets fed into the output link from QS plugin.
Hi Vmcburney
Hi Vmcburney,
you siad that i can feed in just the primary data and let QualityStage open the match data file directly from a flat file and not from a DataStage link.
i see in the documentation for the plug-in that When you run a match, the data file should be bound to a QualityStage stage link, whereas the reference file should be taken from its original location, usually <Master_Project_dir>/Data
Can u kindly explain me how do i do this. Like according to my understanding what u said is i just provide the file to be matched as the input to the plug in and the reference file has to be opened by the quality stage.
how can i do this.
thanks.
you siad that i can feed in just the primary data and let QualityStage open the match data file directly from a flat file and not from a DataStage link.
i see in the documentation for the plug-in that When you run a match, the data file should be bound to a QualityStage stage link, whereas the reference file should be taken from its original location, usually <Master_Project_dir>/Data
Can u kindly explain me how do i do this. Like according to my understanding what u said is i just provide the file to be matched as the input to the plug in and the reference file has to be opened by the quality stage.
how can i do this.
thanks.
g.kiran
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Does anyone have a preferred strategy for keeping the reference data up to date?
When there's no match, the source data must be pushed through a STAN and then added to the reference data. The problem is when? What if the caller (DataStage job) elects not to commit?
We've implemented a temporary table of new reference data, and the QS process matches against a view that is a UNION of the original reference data and the new. But it seems unwieldy.
This has to work in a RTI environment too.
The other thing we've done is to create a number of generic columns in the database table (and in the temporary table) in which the standadised values (NYSIIS of last name, first letter of first name, etc.) are stored; these are used to restrict the list of reference candidates, since there are otherwise millions of rows in the reference set.
When there's no match, the source data must be pushed through a STAN and then added to the reference data. The problem is when? What if the caller (DataStage job) elects not to commit?
We've implemented a temporary table of new reference data, and the QS process matches against a view that is a UNION of the original reference data and the new. But it seems unwieldy.
This has to work in a RTI environment too.
The other thing we've done is to create a number of generic columns in the database table (and in the temporary table) in which the standadised values (NYSIIS of last name, first letter of first name, etc.) are stored; these are used to restrict the list of reference candidates, since there are otherwise millions of rows in the reference set.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.