QS: limitation using datafile

ponzio · Post by **ponzio** » Thu Feb 09, 2006 9:48 am

Hi.
I've been working with QualityStage for some years...but
I understood just some days ago a big limitation using datafile inside the same QS project.

PROBLEM
The same datafile can't be used, inside the same project, either as FileA and FileB in two different jobs;
for example using the file as FileA in an undup stage and as FileB in a geomatch stage

MY SOLUTIONS

1] Create 2 distinct projects, one for the undup job, and another one for the geomatch job. Then I need to copy (or move) the common file in the Data directory of the second project

2] create a new datafile, identical to the datafile in common, and use this in the geomatch, job for example. Then I need to create, on the filesystem, a symbolic link to the real file and whose name is the same of the new datafile created (the one used in geomatch job)

Please, can someone suggest me another way ?
Thanks,
Andrea

ray.wurlod · Post by **ray.wurlod** » Thu Feb 09, 2006 9:11 pm

Welcome aboard! :D

Are you running your QualityStage jobs independently or from DataStage. If the latter, why not have it split the source data into two streams for the separate QualityStage jobs? OK it's two copies of the data, but it's in memory rather than two physical files.

ponzio · Post by **ponzio** » Wed Feb 15, 2006 3:05 am

ray.wurlod wrote:Welcome aboard! :D

Are you running your QualityStage jobs independently or from DataStage. If the latter, why not have it split the source data into two streams for the separate QualityStage jobs? OK it's two copies of the data, but it's in memory rather than two physical files.

Hi Ray :D

I'm running QS independently....
I'm not familiar with running QS from DS, I've just tried a couple of time

but...
I think that also with your method the problem will persist !
The problem is with the deploy informations, and these informations will not change dependentely on how you run the QS job, isn't it?
The problem is not the actual data used, but how to use it "File A" rather than "File B"....

Many thanks :D

ponzio · Post by **ponzio** » Wed Feb 15, 2006 3:23 am

ponzio wrote: The problem is with the deploy informations
....
The problem is not the actual data used, but how to use it "File A" rather than "File B"....

To be precise, if the name of the data file is INPUT (for example),
The deploy will create the file INPUT.DIC in the DIC directory under the project directory...

Consider 2 jobs that use that file, one of these is a undup job and the other a geomatch job.
INPUT is the reference file (File B) of the geomatch job, and it is the only input file for the undup job (File A).

We have 2 jobs but only 1 file INPUT.DIC!!
So if we deploy the undup job first, the deploy of the geomatch job will override the INPUT.DIC created for the undup job...
if the geomatch job will be deployed before, the deploy of undup job will override the INPUT.DIC created for the geomatch job

The difference in the 2 versions of the 2 files is the line

FILE ${DATAA}
in the file created for the undup job

FILE ${DATAB}
in the file created for the geomatch job

This line indicates how to use the file when used in a job

Different problems will occur depending on the deploy order and the running order of the 2 job

ray.wurlod · Post by **ray.wurlod** » Wed Feb 15, 2006 3:30 pm

(I don't have an immediate answer. I shall give it some thought.) Meanwhile, if you post on Developer Net you may get an earlier response.

ponzio · Post by **ponzio** » Wed Feb 22, 2006 6:02 am

After this discovery I've read in the QS documentation file MatchConcepts.pdf this sentence

Record Linkage Projects
You assign each record linkage application a project name. This
project name is used in all of the steps of the linkage except for data
dictionary creation

That sentence strengthens what we saw in the files