Running QS job from within DataStage Server

Infosphere's Quality Product

Moderators: chulett, rschirm

Post Reply
keith.walker
Participant
Posts: 1
Joined: Mon Jan 24, 2005 4:31 am

Running QS job from within DataStage Server

Post by keith.walker »

I am new Acsential and wonder if someone could help me with the following problem ?

I have a QS standardisation job with 7 stages. Each of these stages produces 1 or 2 files one of which is used in the next stage. When I run this job from QS I end up with 9 files. When I run this job from within a datastage job, I end up with 3 files (having deleted the original 9 files first). If an output file from any of the QS stages is used as an input file by the next QS stage it does not appear in the expected directory when the QS job is run in a datastage job.

All the QS stages are set to append to the output files, but I always make sure I have deleted the QS files before running it from DS.

Thanks in anticipation
Keith Walker
PilotBaha
Premium Member
Premium Member
Posts: 202
Joined: Mon Jan 12, 2004 8:05 pm

Post by PilotBaha »

Keith,
let me make sure that I understand it correctly. Say you have the following stages :

STAGEA
STAGEB
....
STAGEG

and you are trying to access to the result of the STAGEB (and intermediate stage) when you run the job from the plug in.. Am I correct on this?

If I am it is normal that you are not getting those files in between when you are running through QS Plugin.

You can redesign your jobs to capture those files, or you can execute the shell scrip that invokes your QS job with the command mode from DS. (You lose some nice things, like metadata info if you go this way though)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard! :D

When QualityStage is invoked from DataStage, the concept of "file" can mean "data stream". It still needs "file", so that it can find the file definition in the QS project directory, but it does not need a physical file; the data stream can be managed in memory.

This becomes even more vital in a parallel environment, where different parts of the data stream can be being processed by different processing nodes (different CPUs or even different machines). Managing physical files in that scenario would be a nightmare, and would negatively impact throughput.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply