DataStage Server Job Concepts

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
jerome_rajan
Premium Member
Premium Member
Posts: 376
Joined: Sat Jan 07, 2012 12:25 pm
Location: Piscataway

DataStage Server Job Concepts

Post by jerome_rajan »

Hi,
I am currently reviewing some decade old server jobs. Having worked with parallel jobs all the while, I'm running into a few conceptual road-blocks. Please help me understand the following:

Code: Select all

1. There's a job that uses a Universe table as a reference for a lookup. Why and why not a hash file?
2. What exactly are Universe tables? I thought they used to be DataStage internal tables but these jobs are creating tables in the uv database
3. I see that there is no 'JOIN' stage in a server job. What then would be the ideal approach to join 2 voluminous datasets?
4. DataSet is a parallel concept. What comes closest in nature to it in a server job? What are the advantages of using a hash file as an intermediate data store over a sequential file?
Jerome
Data Integration Consultant at AWS
Connect With Me On LinkedIn

Life is really simple, but we insist on making it complicated.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

1. Perhaps they wanted to use some additional filtering, via user-defined SQL?

2. UniVerse is a database product originally created by VMARK Software and on which DataStage was originally built. All UniVerse tables are hashed files, but also have "system table" entries that describe them, and the privileges that have been granted on them. Not all hashed files are UniVerse tables.

3. If they are in, or accessible to, the same database server, do it there. If they are text files, use the Merge stage. Otherwise use a Transformer stage to effect a lookup (which, by default, is a left outer join, but you can use the NOTFOUND link variable to constrain it back to an inner join).

4. There is nothing resembling a Data Set in server job world. Actually that's not true, there is something in UniVerse called a "distributed" hashed file, in which component hashed files are described by a descriptor. But you will not find these documented in DataStage documentation.

4. Hashed files are probably not useful as intermediate storage if you have duplicate key values, since hashed files destructively overwrite when the key is the same. Also, writing to and reading from sequential files is much faster than streaming data into/out of hashed files. Hashed files are intended for key-based access, and are VERY FAST at that.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply