Running job in control M

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Running job in control M

Post by us1aslam1us »

Hi all,

Can anybody help me in understanding what steps i need to follow to schedule the jobs in control-M.This is the first time i am using this and i am very much novice to this stuff.While doing search i found ken's script for running the jobs,but still i haven't been able to figure out what inputs i need to feed it with like the jobnames,sequence name,parameters etc.Any help in this matter is appreciated.

Thanks
sam
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Control-M uses an agent on each server to start processes. You will need to use command line dsjob to start a job from the scheduler. I don't recommend that each job be a separately scheduled process, rather schedule a controlling Batch or Sequence job and let that manage the load effort.

So, you'll need to invoke dsjob from Control-M. Because you need to be able to evaluate the results of running a Batch or Sequence, you'll want a script between Control-M and dsjob. That script can look at the return codes from dsjob, as well as potentially evaluate any logs or error files produced during the jobstream run. In addition, the script can pick up any environment variables and place them into the dsjob command line structure.

For Control-M, you'll need to make sure that Control-M sets the proper environment variables and executes the scripts under the correct userid.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
fridge
Premium Member
Premium Member
Posts: 136
Joined: Sat Jan 10, 2004 8:51 am

Post by fridge »

Hi, we use controlM quite easily here - all it does is run a command line after all but a few points that you may wish to consider

a) you could have CtrlM kick of a $DSHOME/bin/dsjob -etc -etc -etc -project PROJNAME -job JOBNAME command but this isnt particular migratable between dev and prod enviornments as the PROJNAME will prob be different - a tidier approach is to write a wrapper script (e.g. RunAnyDSJob.ksh, that accepts a parameter of job name , and possibly a generic PROJNAME e.g. INCOMEPROJ), then have slighly diffferent versions on dev and prod servers with the differences coded there - I.E one point of change

b) Check which user CNTLM runs under when invoking the jobs - as make sure its in the correct group (normally dsadm)

c) I wouldnt advocate having CtrlM passing all the parameters to the job (e.g. source file location, database name etc, partly because you have the same migration problems listed above , and also because its a pain in the bum - what I have implement is a set of parmater files for each job that get read at run time (many ways of doing this - I believe there is a dsreadparamfromfile function - but I just used a bit of cludgy awk)
the only parameter I would feed from CtrlM is ODATE (processing date)

d) Finally - one thing you may wish to consider is how CntrlM shows logs to the end user - basically it can echo standard sysout to the Operator , but this will not show the more useful Datastage Logs as these are tucked away in Universe Files - a good solution to this is to write a bit of code (or use the dsjob -log (i think) to interrogate the logs after a sucessful job run and echo the results to sysout (probably picking out the pertiniatant bits - i.e warnings and errors only)


Apologies if this seems to make your job harder, but worth getting the CntlM / DataStage Interface sleek before you go to far down the line as it will save a lot of pitfalls later on
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

fridge wrote: c) I wouldnt advocate having CtrlM passing all the parameters to the job (e.g. source file location, database name etc, partly because you have the same migration problems listed above , and also because its a pain in the bum - what I have implement is a set of parmater files for each job that get read at run time (many ways of doing this - I believe there is a dsreadparamfromfile function - but I just used a bit of cludgy awk)
the only parameter I would feed from CtrlM is ODATE (processing date)
Hi fridge,

Could you tell me where will be my parm file?Do i need to mention some path there....
I just want to know how and where these parm files are created.

Thanks
sam
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

Hi ,

Here is my job description.

I am having one main sequence job(which is having a single job for the date update and the other one is a sequence job).I need to schedule the main sequence job in the control-M.I need to check that if the single job fails then the main sequence job should ABEND.For scheduling this i should have one ksh script.I have seen ken's script but i am not sure what changes i need to make in that.(like what should be the parm file name and other stuff)As i said earlier this is the first time i am trying this and i am not good at unix scripting.The reason why i am doing this is my lead is not well and in her absence i need to complete this.

Thanks and appreciation
Sam
fridge
Premium Member
Premium Member
Posts: 136
Joined: Sat Jan 10, 2004 8:51 am

Post by fridge »

sorry sam , was speaking a bit generically - but since I am downloading a patch for UT2004 and twiddling my meterphorical thumbs will try to expand.

Basisically a datastage job could take many variables , examples are

Source File e.g. /prd/data/CUST_20050101.DAT
Target DataBase e.g. BIWAREHOUSETHINGY
Processing Date e.g. 20050101

and so on

The only one that would change between dev / systest and prod is the first (where prd replaced with dev/sys - using own convention here)

This COULD be coded into controlm , but becomes cumbersome and hard to change if Enterprise Scheduling handled by diff dept as often the case

The parmater files to store this info could actually be stored anywhere and could consist of a simple format as follows (for the above example)

***** PARM FILE *******
pSOURCEFILE : ${ENV}/data/CUST_20050101.DAT
pTGTDB : BIWAREHOUSETHINGY
pPDATE : %ODATE%


This is pretty much how I implemented it

The above file would be could then be moved unamended between enviornments with the $ENV - being picked up from unix enviornment variables (or more elegantly by setting and enviornment variable on the datastage project and simply passing a relative path eg. /data/CUST_20050101.DAT

The tricky bit is the ODATE - this should be passed from controlM into the
RunAnyDSJob.ksh I mentioned before and propogated.

This all involves a wrapper script to read the parameter file, parse it and create a cmd in the form of
dsjob -param pSOURCEFILE=/prd/data/CUST_20050101.DAT \
pTGTDB=BIWAREHOUSETHINGY \
pPDATE=20050101 \
MYPROJECT \
MYJOB

I did this by having a script that gets called with 2 parameters jobname and Odate as follows

RunAnyDSJob.ksh JOBNAME ODATE(from ControlM)

And uses some simple awk to parse the said parameter file

which would read a paramater file called /prd/config/<JOBNAME>.prm to get the env specific parameters.

As mentioned datastage has a routine called something like 'DSReadParmsFromFile' that will do some of this but dont believe can handle the ODATE bit (order date is the one variable is truly external and cant be coded before hand)

Another option is Parameter Manager (3rd party add on) that I am sure will do a lot of this

Sorry if this is purely a starter for 10 as it were but been socialising tonight (sometimes happens) and not brilliant explaination - but hopefully gives some ideas - if need more let me know and will try to elucidate
amsh76
Charter Member
Charter Member
Posts: 118
Joined: Wed Mar 10, 2004 10:58 pm

Post by amsh76 »

Hello Ken,

Just wanted to understand more on the point 'schedule a controlling Batch or Sequence job and let that manage the load effort'.

What I have seen and I prefer is giving control to the Scheduling tool, that way it has control over all the jobs and easy to maintain. I do ot have to worry about restartability of a seq., everything will be taken care off by sequencer.

Any specific reason for sequencer ?
kumar_s
Charter Member
Charter Member
Posts: 5245
Joined: Thu Jun 16, 2005 11:00 pm

Post by kumar_s »

Restratability may be taken care by Job Sequence, where as there are some points like %CPU usage at a point of instance and I/O rate... which need to be taken care during configuring in ^M
Impossible doesn't mean 'it is not possible' actually means... 'NOBODY HAS DONE IT SO FAR'
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

amsh76 wrote:Any specific reason for sequencer ?
In a development environment, how do you run a complex series of interdependent jobs without using a scheduler? A "jobstream" allows you to build and test the jobstream. A sophisticated or dynamic stream of jobs won't work within the confines of an enterprise scheduler.

For example, a large customer of mine has 42 jobstreams, each scheduled within the enterprise scheduler. Some of those jobstreams have 300+ jobs in them, whereas other jobstreams have 10-20 jobs. But, each jobstream has three follow-on processes that archive the work directory contents, notify users/the BI reporting environment that that jobstream has loaded, and distribute files to downstream environments. So, the enterprise scheduler has 42 x 3 = 126+ objects in the enterprise scheduler just to deal with the enterprise data warehouse nightly load cycle, not including the weekly, monthly, and quarterly processes. If we put every job itself into the scheduler, we'd have 1200+ objects in the scheduler. It's difficult enough with 150, but almost 10X is unmanageable. :shock:

That's even before mentioning that some jobstreams look at the data being processed and run the appropriate jobs to handle the data, which sometimes means adding more jobs to the stream based on conditions. In addition, they acquire the appropriate parameters and run accordingly. When you have more than 1 job, you have to manage all jobs running with the appropriate matching set of parameters. A sophisticated parameter system is often required in mature data warehouses.

Building your entire enterprise load in a single job is not an option. :shock: If you're acquiring data from multiple source systems in different accessibility windows, with different target loading deadlines (Service Level Agreements "SLA's") you can't do this. Some folks opt not to deal with job schedulers and just build their entire process in one job. I call those customers "Future Business" for me, as I receive calls every week to consult customers who can't manage their implementations after they've gone live. Pays the bills! :lol:
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
amsh76
Charter Member
Charter Member
Posts: 118
Joined: Wed Mar 10, 2004 10:58 pm

Post by amsh76 »

Thank You Ken, that help.

So in brief can I say : Depending on the number of jobs and complexity involved in trigerring the job one can opt for Sequencer or Scheduling Tool.

The way we do it here is, before any development work, we come up with one common parameter list, which is used for all the jobs and in that way I just have to create one script to trigger the jobs in production, with job name as one of the parameter. Do you think it make sense?
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I'm recommending an intelligent job stream controlling process, a Sequencer or custom job control, be put into the enterprise scheduler. You need both devices in my opinion. It's not scalable to put every single script, job, program into the scheduler. Sometimes you write a script to run other scripts because that modularity makes the software development sensible and maintainable. You put the main script into the schedule.

As for parameter management, since I'm on the custom job control side, I solved all of those problems years ago by not re-inventing the wheel for every customer. That's a totally separate conversation but I will say this:

1. Changing source code and recompiling in production just to accommodate dynamic runtime values is not SOX compliant, nor a solution I would personally design.
2. All processes, no matter what tool or language used, must be parameterized sufficiently to allow movement between dev, test, qa, and prod environments.
3. All source code must be checked into a library tool and deployed as-is without changes once in the library.
4. Parameters sometimes must be dynamically generated, your processes must be able to do this themselves without human intervention.
5. Humans, trained monkeys, cats on the keyboard, I don't care. No living entity manually runs processes: the processes must be scripted or automated.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

A Sequencer can only perform one task; triggering either when Any or All if its inputs have fired.

The term you need here is Job Sequence. Or perhaps Sequence. Not Sequencer.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
us1aslam1us
Charter Member
Charter Member
Posts: 822
Joined: Sat Sep 17, 2005 5:25 pm
Location: USA

Post by us1aslam1us »

Hi

How can i pass dynamic parameters (which will change for every run) in this script?I am using a seperate job for getting the max date(storing this in .parm file) and using this date as a parameter in the sequence job.so while scheduling this sequence job how can i pass this dynamic date parameter.

Thanks
sam
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Searching the forum would have revealed that the answer to your question is to use as many -param name=value options in the dsjob command as you require. How you set the value in your script is entirely up to you; it can be a shell variable for example.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
lfong
Premium Member
Premium Member
Posts: 42
Joined: Fri Sep 09, 2005 7:48 am

Post by lfong »

I assume you are running on the mainframe. If you have universal command, you can control starting, resetting, checking codes of your datastage job through your JCL. In most cases, you will not need sequence jobs. You then use your job scheduler to run your JCL.
Post Reply