DataStage Grid - Where to run dsjob & non - DS - Head No

kmohancet · Post by **kmohancet** » Sun Sep 14, 2014 12:21 pm

In an environment with a DataStage 9.1 grid on RHE Linux and shared by multiple teams within the entire IT, where do you run the dsjob as well as non-DS scripts? Please consider the following -

Only one active head node
Run thousands of DS jobs a day (if the same DS job is run 20 times a day, i am counting it as 20).
Run thousands of non-DS jobs (A few of which are resource intensive while most others are not). A typical subject area load looks like this, non-DS script, one or more DS jobs, one or more non-DS scripts, ...
Everyone logging into the head node just to deploy code and schedule cron jobs (no enterprise scheduler)

What am I looking for? Is there a need for a dedicated ETL server(s) that runs all the non-ds scripts (locally) and dsjobs (remotely on the grid)?

ray.wurlod · Post by **ray.wurlod** » Sun Sep 14, 2014 11:13 pm

Welcome aboard.

I'd say that, if your system is running satisfactorily as it is, and you seem to have sufficient headroom for all planned expansion, then leave well alone.

I'm assuming here that all non-DS work is also submitted via whatever grid management software you use.

PaulVL · Post by **PaulVL** » Mon Sep 15, 2014 8:00 am

Hi kmohancet, welcome aboard.

As you know, dsjob MUST be executed on the Head Node for which the project is assigned to. (I say this because you might go multi head node in the future, with shared compute nodes).

I find it best to farm off all non essential work off to the compute nodes. Your Head Node is the most important piece of HW in that grid setup. I like to set up server off to the side to handle tar/gzip/ftp/etc... You can put it into your grid if you want to load balance that work, but put it into a different queue and make sure DS jobs don't get dispatched to that server. Do not mount the engine binaries to that server otherwise you have to license it.

Make life easy for yourself and script up a mechanism to help your users dispatch jobs to that server. "grid_it.sh blah blah blah" Make it easy for users to use, and they will adopt it. Write your API docs, expectation on what type of work should be farmed off and what should not. Use your gridjobdir path for logging stdout/err since it's already exposed to all the compute nodes.

If you can afford a DS setup with GRID, you really should get an enterprise scheduler. But, if you are good with the current setup, so be it.

I've seen peoplel set up jobs just to ftp files around, using my head node. Don't even get me started on gzips of 120GB files. argg...

kmohancet · Post by **kmohancet** » Thu Sep 18, 2014 8:53 pm

Thanks Ray and Paul for very informative and prompt responses!

Waiting for some more responses from other forum members with issues they have faced.

kmohancet · Post by **kmohancet** » Mon Dec 01, 2014 12:43 am

A few changes since the last post (addition of control-m being the main).

Here is what our grid looks like (still being installed).

---------------------------------------------------
|________LOAD BALANCER__________|
---------------------------------------------------
| WAS - 1 _________|_______WAS - 2 |
---------------------------------------------------
| DB2 - 1 (repository) |___ DB2 - 2 (rep) |
---------------------------------------------------
| Head Node _______| Compute Node - 1|
|________________| also, failover HN |
---------------------------------------------------
| Compute Nodes 2 - N ...........................|
---------------------------------------------------

To invoke jobs from Control-M,

Option 1 - Run dsjob command remotely pointing to Load Balancer. Is this possible? Grid Red Book says, "all the DataStage and QualityStage jobs have to be invoked from the Head Node". Their configuration has the repository and the Services layer installed on the Head Node server.

Option 2 - Run dsjob command remotely pointing to Head Node (is it even possible as the load balancer is how general users, including control-m, know about DataStage)

Option 3 - ssh from control-m into the Head Node and invoke dsjob there. What happens when Head Node is down and Compute Node 1 is acting as Head node?

Please let me know if option 3 is the only option.

Thanks,

DSXchange

DataStage Grid - Where to run dsjob & non - DS - Head No

DataStage Grid - Where to run dsjob & non - DS - Head No

Thanks and waiting for more responses