Two inputs to the same hashed file
Moderators: chulett, rschirm, roy
Two inputs to the same hashed file
I have two transformers to the same hashed file and both have the checkbox Create File and Clear File before writing selected and I get results of the both transformers
How can this be when both have these checkboxes checked?
How can this be when both have these checkboxes checked?
Colin Larcombe
-------------------
Certified IBM Infosphere Datastage Developer
-------------------
Certified IBM Infosphere Datastage Developer
imagining a job simple job design like
source--------->T1------------------------->HASH-FILE
|-------T2----------------->
from source, data extracts to transformer T1 and writes to hash file, and also data moves from T1 to T2, transformer T2 also writes to same hash file with setting as mentioned by colin.
source--------->T1------------------------->HASH-FILE
|-------T2----------------->
from source, data extracts to transformer T1 and writes to hash file, and also data moves from T1 to T2, transformer T2 also writes to same hash file with setting as mentioned by colin.
Sachin - if you have a specific question, please ask it. At the moment, I feel like I'm just repeating myself on the explanation, which I can't imagine is all that helpful.
Rather than imagine your job design - build it. Run it. Check the outputs and the logs to see how it behaves with respect to the hashed output. Or come back with a question or two, I'll see about answering them as best as I can.
Rather than imagine your job design - build it. Run it. Check the outputs and the logs to see how it behaves with respect to the hashed output. Or come back with a question or two, I'll see about answering them as best as I can.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
If Colin says there's 2 Transformers, then it may the case that there is another Transformer that is directing rows down one path or the other. In either case this will work fine if all of the rows always come from one data stream or the other. If rows are flowing down two separate streams then you have to wonder if the Clear File is occuring one the slower stream after the faster stream has already placed rows into the hashed file.
If the design is a pair of OCI-->XFM-->HASHed streams then there's no coordination between the two streams and there is definitely an issue, unless the OCI stages have a WHERE clause that satisfies only one of the two streams and thus rows flow from only one of the two streams (as long as the stream that returns no rows does it quickly and closes before the second stream starts sending data).
Colin has a right to be worried that data isn't being lost.
If the design is a pair of OCI-->XFM-->HASHed streams then there's no coordination between the two streams and there is definitely an issue, unless the OCI stages have a WHERE clause that satisfies only one of the two streams and thus rows flow from only one of the two streams (as long as the stream that returns no rows does it quickly and closes before the second stream starts sending data).
Colin has a right to be worried that data isn't being lost.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Of course. Didn't really want to get into all of the gory details of what might be based on various job design considerations but thanks for throwing that into the stew pot, Ken.
Was hoping Colin would come back and clarify his exact job design so his specific fears can either be confirmed or denied.
Was hoping Colin would come back and clarify his exact job design so his specific fears can either be confirmed or denied.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Sure, they support multiple writer processes. There's still the 'last one in wins' destructive overwrite per key to keep in mind, of course.
They aren't available in PX because they commit the cardinal sin of being disk based entities.
They aren't available in PX because they commit the cardinal sin of being disk based entities.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
They're not in PX because PX is a horse of a different color. In a distributed processing environment anything "shared" tends to be anathema. Server hashed files exist only on the Server node - so a multi-node or clustered environment doesn't run Server on all nodes. IBM would love to sell more licenses per node for Server, but nobody would do that. Installing Server on every node is probably a taxing process as well.PhilHibbs wrote:So is it ok in principle to have two inputs to the same hashed file? Two processes, both updating a hashed file simultaneously, in a server job? I thought hashed files weren't available in PX because they didn't support parallel access.
Don't hijack the thread :D The question is do hashed files support simultaneous inputs - YES. Do hashed files with simultaneous inputs with Clear File checked present potential issues - YES.
Kenneth Bland
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
Golly gosh, I didn't mean to create all this confusion and uproar. I prefer to leave that sort of stuff to politicians
Sorry I have not replied earlier but I got put on another "urgent" non-urgent task.
For clarification
My test job : Text files via two separate transformers writing to one hashed file. Both transformers are writing to the hashed file at the same time with two different datasets.
I imagined that the first transformer would run in its entirety with create and delete command being issued first, then the second transformer would execute another create and delete and overwrite the first transformer's data.
Luckily I am using text files and not OCI so this seems to be working but it does seem rather odd that HF will not behave in the same way if fed from an OCI but not a text file.
Sorry I have not replied earlier but I got put on another "urgent" non-urgent task.
For clarification
My test job : Text files via two separate transformers writing to one hashed file. Both transformers are writing to the hashed file at the same time with two different datasets.
I imagined that the first transformer would run in its entirety with create and delete command being issued first, then the second transformer would execute another create and delete and overwrite the first transformer's data.
Luckily I am using text files and not OCI so this seems to be working but it does seem rather odd that HF will not behave in the same way if fed from an OCI but not a text file.
Colin Larcombe
-------------------
Certified IBM Infosphere Datastage Developer
-------------------
Certified IBM Infosphere Datastage Developer
Bah, no uproar just our normal healthy discussions.
Both to a single hashed file stage, both links write to the same hashed file name. This would behave the same way as they are functionally equivalent:
Two separate stages, same hashed file name. Not sure where the OCI comment comes from - Sequential or OCI, the behaviour would be the same in the above examples.
Nope, they would run simultaneously if the job looks like this:clarcombe wrote:I imagined that the first transformer would run in its entirety with create and delete command being issued first, then the second transformer would execute another create and delete and overwrite the first transformer's data.
Code: Select all
Seq --> Tfm --+
+--> Hashed
Seq --> Tfm --+
Code: Select all
Seq --> Tfm --> Hashed
Seq --> Tfm --> Hashed
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers