Partitioning in SMP
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
1) Not necessary, but wise. You get the same benefits as on MPP.
2) Your result is an artifact of using a small data volume. For large data volumes, you will get a quicker completion time using two nodes versus using one.
3) Partitioning works in exactly the same way on an SMP system with just two exceptions: Entire partition is managed by creating one Data Set in shared memory, and all Section Leader processes can be started with fork() so there is no requirement for configuring rsh
2) Your result is an artifact of using a small data volume. For large data volumes, you will get a quicker completion time using two nodes versus using one.
3) Partitioning works in exactly the same way on an SMP system with just two exceptions: Entire partition is managed by creating one Data Set in shared memory, and all Section Leader processes can be started with fork() so there is no requirement for configuring rsh
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ray,
Using a 2 node config file would result in more processes spawned but how does that guarantee faster execution since my job, when it runs, consumes 98 to 100% of both the CPUs. Running on a 2 node config would result in slower execution due to context switching time among processes and maybe that's the reason my job(using 2-node) took longer time to complete than 1-node
A brief layout of my job on which I experimented with 1-node and 2-node config files(with appropriate partitioning):
Since I didn't find any improvement in execution time with 2-node I reverted back to 1-node config file.
Thanks for your time.
Using a 2 node config file would result in more processes spawned but how does that guarantee faster execution since my job, when it runs, consumes 98 to 100% of both the CPUs. Running on a 2 node config would result in slower execution due to context switching time among processes and maybe that's the reason my job(using 2-node) took longer time to complete than 1-node
A brief layout of my job on which I experimented with 1-node and 2-node config files(with appropriate partitioning):
Code: Select all
DataSet
[ 5.5 mil records,
250 fields]
|
|
Sequential file --> Lookup --> Transfomer --> Filter ----> Funnel --> Dataset
[1 record, [9 links out
2 fields] from filter]
Since I didn't find any improvement in execution time with 2-node I reverted back to 1-node config file.
Thanks for your time.
I would say that one input record counts as a 'small data volume'. And what kind of parallel processing do you think would be going on in a job that processes a single record? How many rows come from the lookup to the target? I'm wondering if the answer is 1 or 5.5 million.ray.wurlod wrote:2) Your result is an artifact of using a small data volume. For large data volumes, you will get a quicker completion time using two nodes versus using one.
Out of curiousity, is that just a testing volume and it will be a great deal larger in reality or is that all it will ever do?
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
5.5 million records get populated in target.How many rows come from the lookup to the target? I'm wondering if the answer is 1 or 5.5 million.
It's running in production. So the number of records may be between 5 and 6 million.Out of curiousity, is that just a testing volume and it will be a great deal larger in reality or is that all it will ever do?
-
- Premium Member
- Posts: 1735
- Joined: Thu Mar 01, 2007 5:44 am
- Location: Troy, MI
Since you have 2 CPUs and resource utilization is high, increasing the number of nodes will not give you better results.when it runs, consumes 98 to 100% of both the CPUs.
If you are increasing the number of nodes you need to make sure that there are enough resources available for consumption by extra processes created.
Ray is correct on this point. But it should be extended to include resource availability.You get the same benefits as on MPP.
Priyadarshi Kunal
Genius may have its limitations, but stupidity is not thus handicapped.
Genius may have its limitations, but stupidity is not thus handicapped.
-
- Premium Member
- Posts: 99
- Joined: Mon Sep 03, 2007 7:49 am
- Location: Stockholm, Sweden
Code: Select all
DataSet
[ 5.5 mil records,
250 fields]
|
|
Sequential file --> Lookup --> Transfomer --> Filter ----> Funnel --> Dataset
[1 record, [9 links out
2 fields] from filter]
-------------------------------------
http://it.toolbox.com/blogs/bi-aj
my blog on delivering business intelligence using agile principles
http://it.toolbox.com/blogs/bi-aj
my blog on delivering business intelligence using agile principles