Two inputs to the same hashed file

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Two inputs to the same hashed file

Post by clarcombe »

I have two transformers to the same hashed file and both have the checkbox Create File and Clear File before writing selected and I get results of the both transformers

How can this be when both have these checkboxes checked?
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

How can this be what? Working? The 'Clear' happens when a job starts and the stage 'opens' so guessing it is happening twice right up front when it doesn't matter.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sachin1
Participant
Posts: 325
Joined: Wed May 30, 2007 7:42 am
Location: india

Post by sachin1 »

hello chulett please can you elaborate.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sorry but elaborate on what aspect? What is unclear? Colin may need to be elaborating as well as we don't know the actual job design.
-craig

"You can never have too many knives" -- Logan Nine Fingers
sachin1
Participant
Posts: 325
Joined: Wed May 30, 2007 7:42 am
Location: india

Post by sachin1 »

imagining a job simple job design like



source--------->T1------------------------->HASH-FILE
|-------T2----------------->



from source, data extracts to transformer T1 and writes to hash file, and also data moves from T1 to T2, transformer T2 also writes to same hash file with setting as mentioned by colin.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:? Again, what is unclear? The hashed file is cleared when the job starts and the stage is opened/initialized, not when the first record hits it. So, it gets cleared twice and then the records flow.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

:!: Sachin - if you have a specific question, please ask it. At the moment, I feel like I'm just repeating myself on the explanation, which I can't imagine is all that helpful. :(

Rather than imagine your job design - build it. Run it. Check the outputs and the logs to see how it behaves with respect to the hashed output. Or come back with a question or two, I'll see about answering them as best as I can.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

If Colin says there's 2 Transformers, then it may the case that there is another Transformer that is directing rows down one path or the other. In either case this will work fine if all of the rows always come from one data stream or the other. If rows are flowing down two separate streams then you have to wonder if the Clear File is occuring one the slower stream after the faster stream has already placed rows into the hashed file.

If the design is a pair of OCI-->XFM-->HASHed streams then there's no coordination between the two streams and there is definitely an issue, unless the OCI stages have a WHERE clause that satisfies only one of the two streams and thus rows flow from only one of the two streams (as long as the stream that returns no rows does it quickly and closes before the second stream starts sending data).

Colin has a right to be worried that data isn't being lost.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Of course. Didn't really want to get into all of the gory details of what might be based on various job design considerations but thanks for throwing that into the stew pot, Ken. :wink:

Was hoping Colin would come back and clarify his exact job design so his specific fears can either be confirmed or denied.
-craig

"You can never have too many knives" -- Logan Nine Fingers
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

So is it ok in principle to have two inputs to the same hashed file? Two processes, both updating a hashed file simultaneously, in a server job? I thought hashed files weren't available in PX because they didn't support parallel access.
Phil Hibbs | Capgemini
Technical Consultant
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Sure, they support multiple writer processes. There's still the 'last one in wins' destructive overwrite per key to keep in mind, of course.

They aren't available in PX because they commit the cardinal sin of being disk based entities. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

PhilHibbs wrote:So is it ok in principle to have two inputs to the same hashed file? Two processes, both updating a hashed file simultaneously, in a server job? I thought hashed files weren't available in PX because they didn't support parallel access.
They're not in PX because PX is a horse of a different color. In a distributed processing environment anything "shared" tends to be anathema. Server hashed files exist only on the Server node - so a multi-node or clustered environment doesn't run Server on all nodes. IBM would love to sell more licenses per node for Server, but nobody would do that. Installing Server on every node is probably a taxing process as well.

Don't hijack the thread :D The question is do hashed files support simultaneous inputs - YES. Do hashed files with simultaneous inputs with Clear File checked present potential issues - YES.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
clarcombe
Premium Member
Premium Member
Posts: 515
Joined: Wed Jun 08, 2005 9:54 am
Location: Europe

Post by clarcombe »

Golly gosh, I didn't mean to create all this confusion and uproar. I prefer to leave that sort of stuff to politicians :)

Sorry I have not replied earlier but I got put on another "urgent" non-urgent task.

For clarification

My test job : Text files via two separate transformers writing to one hashed file. Both transformers are writing to the hashed file at the same time with two different datasets.

I imagined that the first transformer would run in its entirety with create and delete command being issued first, then the second transformer would execute another create and delete and overwrite the first transformer's data.

Luckily I am using text files and not OCI so this seems to be working but it does seem rather odd that HF will not behave in the same way if fed from an OCI but not a text file.
Colin Larcombe
-------------------

Certified IBM Infosphere Datastage Developer
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Bah, no uproar just our normal healthy discussions. :wink:
clarcombe wrote:I imagined that the first transformer would run in its entirety with create and delete command being issued first, then the second transformer would execute another create and delete and overwrite the first transformer's data.
Nope, they would run simultaneously if the job looks like this:

Code: Select all

Seq --> Tfm --+
              +--> Hashed
Seq --> Tfm --+
Both to a single hashed file stage, both links write to the same hashed file name. This would behave the same way as they are functionally equivalent:

Code: Select all

Seq --> Tfm --> Hashed

Seq --> Tfm --> Hashed
Two separate stages, same hashed file name. Not sure where the OCI comment comes from - Sequential or OCI, the behaviour would be the same in the above examples. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
PhilHibbs
Premium Member
Premium Member
Posts: 1044
Joined: Wed Sep 29, 2004 3:30 am
Location: Nottingham, UK
Contact:

Post by PhilHibbs »

Personally I'd have used a Link Collector stage.
Phil Hibbs | Capgemini
Technical Consultant
Post Reply