DataSet

srinu_p · Post by **srinu_p** » Mon Oct 18, 2004 10:56 am

Hii,

I am new to DataStage.
Can somebody help me with DataSet?
I did read the documentation but i can't understand what it is used for?

SP.

coolkhan08 · Post by **coolkhan08** » Mon Oct 18, 2004 11:24 am

welcome Aboard Srinu !
Dataset is Datastage specific file with .ds extension. It is used in parallel jobs for faster data loading as the dataset resides in the temporary disk space used by datastage parallel extender. The metadata should be same on the i/p and o/p links. You can check the manuals or the forum for more information as it has been discussed earlier too.
Sam

mandyli · Post by **mandyli** » Thu Oct 21, 2004 3:33 am

Welcome Aboard.

What is a data set?

Data set is a like flat file. DataStage parallel extender jobs use data sets to manage data within a job. You can think of each link in a job as carrying a data set.

The Data Set stage allows you to store data being
operated on in a persistent form, which can then be used by other
DataStage jobs. Data sets are operating system files, each referred to by
a control file, which by convention has the suffix .ds.
Using data sets wisely can be key to good performance in a set of linked jobs. You can
also manage data sets independently of a job using the Data Set
Management utility, available from the DataStage Designer, Manager,
or Director.

If you want more information please see the Chapter 56 in Parjdev.pdf..

Thanks
Man.

ray.wurlod · Post by **ray.wurlod** » Thu Oct 21, 2004 3:37 pm

I like to think of a dataset as a text file that is spread over the available processing nodes; which piece is where is recorded in the control file.

richdhan · Post by **richdhan** » Mon Oct 25, 2004 3:19 am

Hi,

What Ray has mentioned is partly right. The data in the Dataset gets distributed across the processing nodes and the control file gives the information where it is distributed. But the Dataset is more than a text file. In a text file everything is represented as a String or a Number whereas in a Dataset the representation of data is based on the datatype and the representation of Null data is also available.

HTH
--Rich

Pride comes before a fall
Humility comes before honour

dsxdev · Post by **dsxdev** » Tue Oct 26, 2004 8:43 am

Hi,
With Datasets you have the advantage of parllellism in reading and writing. Though you have the option of multiple readers in sequential file, there you cannot yuo vriablelength columns. You have to use fixed length columns.

By using datasets you have advantage of multiple nodes reading use variable length columns.