Page 1 of 1

Two inputs to the same hashed file

Posted: Wed Jun 25, 2008 3:18 am
by clarcombe
I have two transformers to the same hashed file and both have the checkbox Create File and Clear File before writing selected and I get results of the both transformers

How can this be when both have these checkboxes checked?

Posted: Wed Jun 25, 2008 6:40 am
by chulett
How can this be what? Working? The 'Clear' happens when a job starts and the stage 'opens' so guessing it is happening twice right up front when it doesn't matter.

Posted: Wed Jun 25, 2008 6:47 am
by sachin1
hello chulett please can you elaborate.

Posted: Wed Jun 25, 2008 6:51 am
by chulett
Sorry but elaborate on what aspect? What is unclear? Colin may need to be elaborating as well as we don't know the actual job design.

Posted: Wed Jun 25, 2008 7:00 am
by sachin1
imagining a job simple job design like



source--------->T1------------------------->HASH-FILE
|-------T2----------------->



from source, data extracts to transformer T1 and writes to hash file, and also data moves from T1 to T2, transformer T2 also writes to same hash file with setting as mentioned by colin.

Posted: Wed Jun 25, 2008 7:11 am
by chulett
:? Again, what is unclear? The hashed file is cleared when the job starts and the stage is opened/initialized, not when the first record hits it. So, it gets cleared twice and then the records flow.

Posted: Wed Jun 25, 2008 7:55 am
by chulett
:!: Sachin - if you have a specific question, please ask it. At the moment, I feel like I'm just repeating myself on the explanation, which I can't imagine is all that helpful. :(

Rather than imagine your job design - build it. Run it. Check the outputs and the logs to see how it behaves with respect to the hashed output. Or come back with a question or two, I'll see about answering them as best as I can.

Posted: Wed Jun 25, 2008 8:50 am
by kcbland
If Colin says there's 2 Transformers, then it may the case that there is another Transformer that is directing rows down one path or the other. In either case this will work fine if all of the rows always come from one data stream or the other. If rows are flowing down two separate streams then you have to wonder if the Clear File is occuring one the slower stream after the faster stream has already placed rows into the hashed file.

If the design is a pair of OCI-->XFM-->HASHed streams then there's no coordination between the two streams and there is definitely an issue, unless the OCI stages have a WHERE clause that satisfies only one of the two streams and thus rows flow from only one of the two streams (as long as the stream that returns no rows does it quickly and closes before the second stream starts sending data).

Colin has a right to be worried that data isn't being lost.

Posted: Wed Jun 25, 2008 9:28 am
by chulett
Of course. Didn't really want to get into all of the gory details of what might be based on various job design considerations but thanks for throwing that into the stew pot, Ken. :wink:

Was hoping Colin would come back and clarify his exact job design so his specific fears can either be confirmed or denied.

Posted: Thu Jun 26, 2008 9:10 am
by PhilHibbs
So is it ok in principle to have two inputs to the same hashed file? Two processes, both updating a hashed file simultaneously, in a server job? I thought hashed files weren't available in PX because they didn't support parallel access.

Posted: Thu Jun 26, 2008 9:22 am
by chulett
Sure, they support multiple writer processes. There's still the 'last one in wins' destructive overwrite per key to keep in mind, of course.

They aren't available in PX because they commit the cardinal sin of being disk based entities. :wink:

Posted: Thu Jun 26, 2008 2:17 pm
by kcbland
PhilHibbs wrote:So is it ok in principle to have two inputs to the same hashed file? Two processes, both updating a hashed file simultaneously, in a server job? I thought hashed files weren't available in PX because they didn't support parallel access.
They're not in PX because PX is a horse of a different color. In a distributed processing environment anything "shared" tends to be anathema. Server hashed files exist only on the Server node - so a multi-node or clustered environment doesn't run Server on all nodes. IBM would love to sell more licenses per node for Server, but nobody would do that. Installing Server on every node is probably a taxing process as well.

Don't hijack the thread :D The question is do hashed files support simultaneous inputs - YES. Do hashed files with simultaneous inputs with Clear File checked present potential issues - YES.

Posted: Fri Jun 27, 2008 5:42 am
by clarcombe
Golly gosh, I didn't mean to create all this confusion and uproar. I prefer to leave that sort of stuff to politicians :)

Sorry I have not replied earlier but I got put on another "urgent" non-urgent task.

For clarification

My test job : Text files via two separate transformers writing to one hashed file. Both transformers are writing to the hashed file at the same time with two different datasets.

I imagined that the first transformer would run in its entirety with create and delete command being issued first, then the second transformer would execute another create and delete and overwrite the first transformer's data.

Luckily I am using text files and not OCI so this seems to be working but it does seem rather odd that HF will not behave in the same way if fed from an OCI but not a text file.

Posted: Fri Jun 27, 2008 7:32 am
by chulett
Bah, no uproar just our normal healthy discussions. :wink:
clarcombe wrote:I imagined that the first transformer would run in its entirety with create and delete command being issued first, then the second transformer would execute another create and delete and overwrite the first transformer's data.
Nope, they would run simultaneously if the job looks like this:

Code: Select all

Seq --> Tfm --+
              +--> Hashed
Seq --> Tfm --+
Both to a single hashed file stage, both links write to the same hashed file name. This would behave the same way as they are functionally equivalent:

Code: Select all

Seq --> Tfm --> Hashed

Seq --> Tfm --> Hashed
Two separate stages, same hashed file name. Not sure where the OCI comment comes from - Sequential or OCI, the behaviour would be the same in the above examples. :?

Posted: Fri Jun 27, 2008 7:36 am
by PhilHibbs
Personally I'd have used a Link Collector stage.