Hello all,
Thanks for your time. We are currently planning to move from an SMP environment to a clustered datastage environment running DStage v8.1. We know that we're going to have to cross mount the engine (via NFS) between server A (Primary) and Server B (secondary) and enable SSH/RSH to get this to work. My question is more related to how datastage will actually execute in this environment? For example if I'm writing to a dataset with a path of /DStage_workfiles/<<projectname>>/mydatasetname.ds is the pointer file created on server A and it contains pointers to the datasets on the disks of server B? Also we have seen that DStage uses a lot of \tmp disk while processing. In a cluster configuration, will it use the \tmp on both machines or just the primary? Basically our situation is this: we are bringing up a new box server A for this project and physically moving the project from the existing server (in this case server B). So is there anything we should be careful of when doing this?
Thanks for your consideration
Architecture of Datastage Processing in a cluster (mpp)
Moderators: chulett, rschirm, roy
-
flynnjd5150
- Premium Member

- Posts: 7
- Joined: Wed Nov 11, 2009 6:57 am
-
kwwilliams
- Participant
- Posts: 437
- Joined: Fri Oct 21, 2005 10:00 pm
Re: Architecture of Datastage Processing in a cluster (mpp)
Biggest thing to be careful of is the management of your configuration files. I'm not a fan of two server cluster systems because it doesn't let me offload enough processing from the head server (Server A for you) to the server's I would prefer to do the processing (Server B or C if you add onto it). The head node needs to have the ability to manage all of the other processes running both locally and remotely and therefore should not be utilized as heavily as the processing nodes. When it can no longer process effectively the entire cluster comes down.
On your scratch - I would not have scratch shared but have disk space available on each node. On resource disk - you should share this disk on all of the nodes on your cluster this allows you to maintain data sets even if one of the servers in your cluster goes down (using a set of configuration files you should already have handy in a different directory for downtime events -- in your case I would have 2 different sets (server A is up and B is down and vice versa).
On your scratch - I would not have scratch shared but have disk space available on each node. On resource disk - you should share this disk on all of the nodes on your cluster this allows you to maintain data sets even if one of the servers in your cluster goes down (using a set of configuration files you should already have handy in a different directory for downtime events -- in your case I would have 2 different sets (server A is up and B is down and vice versa).
Keith Williams
keith@peacefieldinc.com
keith@peacefieldinc.com
-
flynnjd5150
- Premium Member

- Posts: 7
- Joined: Wed Nov 11, 2009 6:57 am
Re: Architecture of Datastage Processing in a cluster (mpp)
Thanks! We hadn't actually taken the resource disk into consideration froma sharing perspective. Each server has its own resource and scratch, but sharing the resource disk makes sense. We are going to have at least 2 different configs since at least 1/3 of the time, Server B is unavailable to us (depending on the month - qtr ends and yr ends are dicey). Do you have any insight as to what actually happens when server A farms out work to server B? Things like, if process A needs to use /tmp - not scratch disk will it use both server's /tmp or will it just use server A's?