I am trying to understand the correlation between Partitions and Nodes.
Job design is two sequentail files feeding to Funnel and Remove Duplicate stage followed by transformer and another sequential file.
I hash partitioned on a key in a remove duplicate stage and I run with 8 node config file and the job ran in 8 partitions(from monitor). when I ran it through a debugger I noticed that two records with the same key value went into two different nodes (node1 and node4) but it deduped correctly.
1. Does this mean that those two nodes were in the same partition ? If so, How do we know what node goes into what partition ? My understanding is that same key value need to go to same partition during hash parition.
2. How is it going to distribute the records If I had 10 records with the same key value ?
What's strange is that If I just run 1 record(same key) in both the input files instead of all the records then both of them goes to a same node.
I am confused. Please let me know if I am not making any sense...
Question on correlation between partitions and nodes
Moderators: chulett, rschirm, roy
In your case the sequential files are being read sequentially, then most likely a round-robin partitioning is being done going into the funnel. Where are you doing your explicit partitioning?
In this context the nodes and partitions are the same thing.
In this context the nodes and partitions are the same thing.
Last edited by ArndW on Tue Jan 22, 2013 3:55 am, edited 1 time in total.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
ArndW - Explicit partitioning is done in Remove Duplicate stage. If nodes and partitions are same then I would not get the expected result as both records goes to different partition. Correct me If I am wrong here.
Ray - If partition is subset of data processed in a node then am not sure why both the records went to two different nodes here.
Thanks for your time...Any input is appreciated.
Ray - If partition is subset of data processed in a node then am not sure why both the records went to two different nodes here.
Thanks for your time...Any input is appreciated.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
By looking at the job score I believe hash partition happened(please see bolded text) when it went to remove dedup stage which should have sent the records to same partition.
Following is the score:
Following is the score:
Code: Select all
main_program: This step has 9 datasets:
ds0: {op0[1p] (sequential sf_Case360_PolStubFields)
eAny<>eCollectAny
op2[8p] (parallel buffer(0))}
ds1: {op1[1p] (sequential sf_ISS_PolStubFields)
eAny<>eCollectAny
op3[8p] (parallel buffer(1))}
ds2: {op2[8p] (parallel buffer(0))
eSame=>eCollectAny
op4[8p] (parallel fnl_merge)}
ds3: {op3[8p] (parallel buffer(1))
eSame=>eCollectAny
op4[8p] (parallel fnl_merge)}
[b]ds4: {op4[8p] (parallel fnl_merge)
eOther(APT_HashPartitioner { key={ value=POL_NB,
subArgs={ cs }
}
})#>eCollectAny
op5[8p] (parallel RmDp_dedup.lnk_to_agg_Sort)}
ds5: {op5[8p] (parallel RmDp_dedup.lnk_to_agg_Sort)
[pp] eSame=>eCollectAny
op6[8p] (parallel RmDp_dedup)}[/b]ds6: {op6[8p] (parallel RmDp_dedup)
eAny=>eCollectAny
op7[8p] (parallel APT_TransformOperatorImplV0S79_BigE_common_2035_Pol_Stub_Fields_ins_initial_load_debug_xfm_pass in xfm_pass)}
ds7: {op7[8p] (parallel APT_TransformOperatorImplV0S79_BigE_common_2035_Pol_Stub_Fields_ins_initial_load_debug_xfm_pass in xfm_pass)
[pp] eSame=>eCollectAny
op8[8p] (parallel buffer(2))}
ds8: {op8[8p] (parallel buffer(2))
[pp] >>eCollectOther(APT_SortedMergeCollector { key={ value=POL_NB,
subArgs={ cs, asc }
}
})
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
So you have proven that the data are being hash partitioned on POL_NB. Are you claiming that there is one value of POL_NB that is found on more than one node? If so, please figure out a way to demonstrate that and get in touch with your official support provider.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.