DataStage Grid - Where to run dsjob & non - DS - Head No

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kmohancet
Premium Member
Premium Member
Posts: 11
Joined: Sun Sep 14, 2014 11:45 am
Location: Nevada

DataStage Grid - Where to run dsjob & non - DS - Head No

Post by kmohancet »

In an environment with a DataStage 9.1 grid on RHE Linux and shared by multiple teams within the entire IT, where do you run the dsjob as well as non-DS scripts? Please consider the following -
  • Only one active head node
    Run thousands of DS jobs a day (if the same DS job is run 20 times a day, i am counting it as 20).
    Run thousands of non-DS jobs (A few of which are resource intensive while most others are not). A typical subject area load looks like this, non-DS script, one or more DS jobs, one or more non-DS scripts, ...
    Everyone logging into the head node just to deploy code and schedule cron jobs (no enterprise scheduler)
What am I looking for? Is there a need for a dedicated ETL server(s) that runs all the non-ds scripts (locally) and dsjobs (remotely on the grid)?
KM
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard.

I'd say that, if your system is running satisfactorily as it is, and you seem to have sufficient headroom for all planned expansion, then leave well alone.

I'm assuming here that all non-DS work is also submitted via whatever grid management software you use.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

Hi kmohancet, welcome aboard.


As you know, dsjob MUST be executed on the Head Node for which the project is assigned to. (I say this because you might go multi head node in the future, with shared compute nodes).


I find it best to farm off all non essential work off to the compute nodes. Your Head Node is the most important piece of HW in that grid setup. I like to set up server off to the side to handle tar/gzip/ftp/etc... You can put it into your grid if you want to load balance that work, but put it into a different queue and make sure DS jobs don't get dispatched to that server. Do not mount the engine binaries to that server otherwise you have to license it.

Make life easy for yourself and script up a mechanism to help your users dispatch jobs to that server. "grid_it.sh blah blah blah" Make it easy for users to use, and they will adopt it. Write your API docs, expectation on what type of work should be farmed off and what should not. Use your gridjobdir path for logging stdout/err since it's already exposed to all the compute nodes.


If you can afford a DS setup with GRID, you really should get an enterprise scheduler. But, if you are good with the current setup, so be it.


I've seen peoplel set up jobs just to ftp files around, using my head node. Don't even get me started on gzips of 120GB files. argg...
kmohancet
Premium Member
Premium Member
Posts: 11
Joined: Sun Sep 14, 2014 11:45 am
Location: Nevada

Thanks and waiting for more responses

Post by kmohancet »

Thanks Ray and Paul for very informative and prompt responses!

Waiting for some more responses from other forum members with issues they have faced.
KM
kmohancet
Premium Member
Premium Member
Posts: 11
Joined: Sun Sep 14, 2014 11:45 am
Location: Nevada

Post by kmohancet »

A few changes since the last post (addition of control-m being the main).

Here is what our grid looks like (still being installed).

---------------------------------------------------
|________LOAD BALANCER__________|
---------------------------------------------------
| WAS - 1 _________|_______WAS - 2 |
---------------------------------------------------
| DB2 - 1 (repository) |___ DB2 - 2 (rep) |
---------------------------------------------------
| Head Node _______| Compute Node - 1|
|________________| also, failover HN |
---------------------------------------------------
| Compute Nodes 2 - N ...........................|
---------------------------------------------------

To invoke jobs from Control-M,
  • Option 1 - Run dsjob command remotely pointing to Load Balancer. Is this possible? Grid Red Book says, "all the DataStage and QualityStage jobs have to be invoked from the Head Node". Their configuration has the repository and the Services layer installed on the Head Node server.

    Option 2 - Run dsjob command remotely pointing to Head Node (is it even possible as the load balancer is how general users, including control-m, know about DataStage)

    Option 3 - ssh from control-m into the Head Node and invoke dsjob there. What happens when Head Node is down and Compute Node 1 is acting as Head node?
Please let me know if option 3 is the only option.

Thanks,
KM
Post Reply