DataSet

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
srinu_p
Participant
Posts: 6
Joined: Mon Oct 18, 2004 10:30 am

DataSet

Post by srinu_p »

Hii,

I am new to DataStage.
Can somebody help me with DataSet?
I did read the documentation but i can't understand what it is used for?

SP.
coolkhan08
Participant
Posts: 25
Joined: Wed Oct 13, 2004 1:11 am

Post by coolkhan08 »

welcome Aboard Srinu !
Dataset is Datastage specific file with .ds extension. It is used in parallel jobs for faster data loading as the dataset resides in the temporary disk space used by datastage parallel extender. The metadata should be same on the i/p and o/p links. You can check the manuals or the forum for more information as it has been discussed earlier too.
Sam
mandyli
Premium Member
Premium Member
Posts: 898
Joined: Wed May 26, 2004 10:45 pm
Location: Chicago

Post by mandyli »

Welcome Aboard.

What is a data set?

Data set is a like flat file. DataStage parallel extender jobs use data sets to manage data within a job. You can think of each link in a job as carrying a data set.

The Data Set stage allows you to store data being
operated on in a persistent form, which can then be used by other
DataStage jobs. Data sets are operating system files, each referred to by
a control file, which by convention has the suffix .ds.
Using data sets wisely can be key to good performance in a set of linked jobs. You can
also manage data sets independently of a job using the Data Set
Management utility, available from the DataStage Designer, Manager,
or Director.

If you want more information please see the Chapter 56 in Parjdev.pdf..

Thanks
Man.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I like to think of a dataset as a text file that is spread over the available processing nodes; which piece is where is recorded in the control file.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
richdhan
Premium Member
Premium Member
Posts: 364
Joined: Thu Feb 12, 2004 12:24 am

Post by richdhan »

Hi,

What Ray has mentioned is partly right. The data in the Dataset gets distributed across the processing nodes and the control file gives the information where it is distributed. But the Dataset is more than a text file. In a text file everything is represented as a String or a Number whereas in a Dataset the representation of data is based on the datatype and the representation of Null data is also available.

HTH
--Rich

Pride comes before a fall
Humility comes before honour
dsxdev
Participant
Posts: 92
Joined: Mon Sep 20, 2004 8:37 am

Post by dsxdev »

Hi,
With Datasets you have the advantage of parllellism in reading and writing. Though you have the option of multiple readers in sequential file, there you cannot yuo vriablelength columns. You have to use fixed length columns.

By using datasets you have advantage of multiple nodes reading use variable length columns.
Happy DataStaging
Post Reply