Starting tips on Datastage required!

ray.wurlod · Post by **ray.wurlod** » Wed Nov 20, 2002 12:33 am

1. Schedule DataStage jobs from the Director client, which invokes cron (if the server is UNIX-based) or at (if the server is Windows based). You can use any other scheduler on the server, invoking the datastage job via its command-line interface, dsjob. Compile DataStage jobs from the Designer client (File > Compile, or Compile tool on toolbar).

2. There is no practical limit to the range of stage types that can be created for DataStage; its architecture is created that way. There are a few stage types always installed, and you can optionally install others either when the server is being installed or otherwise. There is a directory called Packages on the installation media, and a tool called the DataStage Package Installer as part of the DataStage server engine. Stage types exist for most types of access to most kinds of database, on most common platforms as well as mainframes. You do not access these library functions directly; DataStage uses a graphical design interface, and compiling the design inserts all the requisite calls to library functions. Writing a plug-in stage is far too complex a topic to go into here; a manual is available ("DataStage Plug-In Writer's Guide"). You are not yet ready to write plug-in stages; a much deeper understanding of how DataStage works is required.

3. A Routine in DataStage is created using either the Manager client or the Designer client (release 5.0 and later). Earlier than release 5.0 it could only be done in the Manager client. When you open the properties window of a routine, one of the tabs gives you access to the code window. There is a Compile button and a Test button there.
A DataStage transformation routine operates only on input values provided through its arguments, and is invoked for each row processed. Routines can perform arbitrarily complex tasks, simpler tasks can be accomplished by Transforms, or in-line expressions. A DataStage BASIC expression to substitute zero for NULL might be
If IsNull(link.column) Then 0 Else link.column
or
Oconv(link.column,"S;*;*;0")

4. The term "quality checks" is not one with which I am familiar. DataStage can filter data so that only rows that satisfy particular conditions are passed through for loading, by means of constraint expressions on Transformer stage output links. A better approach is to use another tool before using DataStage, such as:
Quality Manager to audit data quality
INTEGRITY (Vality) to create higher quality data

5. The command "language" is initially graphical - you draw pictures of what is to happen; you model the data flow within jobs by creating a Job, and you model the sequence of job execution by creating a Sequence. You can also create a sequence of compiled jobs by creating a Batch (in the Director client) or a Job Control routine (in the Designer client). In all these cases, DataStage will generate source code (in DataStage BASIC) which you can inspect.
[In mainframe jobs, DataStage generates COBOL and JCL which, again, you can inspect.]
There are four views available from the Director client; the overall status of each job, the list of jobs that is scheduled to be executed, the log file for each job, and a more detailed monitor that allows you to view the number of rows being processed by each stage and along each link.

6. The Administrator client has a number of roles, indicated by the multi-tabbed window. Licensing, security, project-wide defaults, and interactive command window are the main ones.

7. Reading individual log entries is done simply by double-clicking on one to see the detail in the Director client log view. This window has Next and Previous buttons so you can move to adjacent events. The Debugger is run from the Designer client, and does have editable breakpoints, single step, link step, watch window and immediate pane, though the last is not interactive as VBs is.

8. Constraints are expressions that set the rules that determine what is a valid row and what is not. Constraint expressions are placed on the output links of Transformer stages. It is also possible to create a "rejects" output link that handles rows that failed the constraint expressions on all other output links.

9. Hashed files are another way to store data where the location of every row is determined by applying a function (the "hashing algorithm") to the key value. These are extremely fast in performing key-based lookups and orders of magnitude more speed is obtained by loading them into memory. They are totally separate from flat files. Hashed files are the means by which tables are implemented in the UniVerse RDBMS (an IBM product).
VOC is a hashed file that contains all the words and tokens used in DataStage; the name is short for "vocabulary". As well as the words and tokens, it contains instructions about what they mean and how to execute them. You should not need to work directly with the VOC file.

10. You can perform aggregations with an AGGREGATOR stage, or by using an intermediate temporary table, selecting from it with a query incorporating one or more set functions (ordinary SQL rules).

11. Metadata that can be imported includes technical metadata (such as table definitions), business metadata (descriptions) and DataStage metadata (such as data elements and other DataStage components).
Process metadata is generated by executing DataStage jobs, and some is viewable in the Director client, log and monitor views. Process metadata can also be exported (usually automatically) to MetaStage.

12. To cleanse data using DataStage, you formulate rules that determine what a "clean" row looks like, and implement these rules as constraint expressions on output links of Transformer stages.

I am curious why you come to this forum seeking this information. Most of it is available in the brochures and manuals produced by Ascential Software. All of the remainder is available in DataStage help files and manuals, which install with DataStage clients.

ray.wurlod · Post by **ray.wurlod** » Wed Nov 20, 2002 12:36 am

The absolutely best advice I could give you before starting to use DataStage is to PLAN.
Plan how to extract your data from source (capture metadata).
Plan how the data need to be transformed.
Plan the rules that determine clean data.
Plan how and when to run jobs (capture metadata).
Plan metadata management.

The second best piece of advice is to get yourself enrolled on training class(es) for DataStage. They do run in India from time to time; I have taught some there myself.

ray.wurlod · Post by **ray.wurlod** » Wed Nov 20, 2002 12:39 am

I can't help also noting that a recent posting seeking "six professional DataStage users" for HCL closed in a matter of hours, with the headhunter not even bothering to reply to enquiries.
Are you one of the six "professionals" that was hired?