Dataset Compression

Abee · Post by **Abee** » Thu Apr 02, 2009 2:27 am

Hi all,

Is it possible to compress the datasets in Datastage 7X. Also kindly help me to understand the advantages and disadvantages of compressing the datasets .

My Questions Extends further below .

Once the DS is compressed can we read the DS without uncompressing it ?
Is it possible to overwrite the compressed DS ?
Is it possible to create the dataset in compressed mode rather than creating it and compress it.
Will there be any space saving when we compress? Also do we have any compression ratio available.
Will the perfomance be affected when we compress a dataset ?

Thanks in Advance

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Apr 02, 2009 2:43 am

You can compress datasets.

But before that can you provide more info on what you are trying to achieve.

Abee · Post by **Abee** » Thu Apr 02, 2009 2:47 am

In a DWH system, we have more than 300 raw files for loading and many star models and interims . We are trying to reduce the space occupied by the Datasets which are created , since we are having some space crunch . Likewise in Oracle if we can able to compress the data and access them as well without hindering the perfomance , we would like to use them . Also we would like to know the compression ratio , since identify the optimal solution .

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Thu Apr 02, 2009 3:08 am

You can compress using normal OS commands.

Compression ratio depends on the content - just like the text files.

Before going down that route, did you try clearing unwanted and "expired" datasets.

Scope · Post by **Scope** » Thu Apr 02, 2009 3:33 am

You cannot compress the dataset. Only the descriptor file will be created in directory where you trying to create the dataset. The data will be stored in the resource disk (path mentioned in configuration file) in internal format.

ray.wurlod · Post by **ray.wurlod** » Thu Apr 02, 2009 5:21 am

You can compress a Data Set (you have to work out where all its data files are, of course) and doing so renders it unusable by DataStage. The gains are negligible because data are already in binary form within a Data Set.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Mon Apr 06, 2009 4:37 am

Even though I wish to differ from Ray that Datasets can be compressed but agree that the benefits are not huge.

Sainath.Srinivasan · Post by **Sainath.Srinivasan** » Mon Apr 06, 2009 4:39 am

Even though I wish to differ from Ray that Datasets can be compressed but agree that the benefits are not huge.

As mentioned before, you may gain by organising it better.

sima79 · Post by **sima79** » Mon Apr 06, 2009 6:24 am

You can compress data using the "compress" stage without having to land the dataset to disk then compress. I suggest that the original poster create a couple of sample jobs, one with the compress stage and one without.

Code: Select all

e.g. source stage -> compress stage -> dataset stage

Run the job and find where the dataset persists its data on disk as defined in the configuration file or in the dataset descriptor file. Compare the sizes. I have managed to get some reasonable space saving (we are not talking huge) using the compress stage in particular using the g-zip setting. Note: nothing is free. You will be sacrificing performance for some space savings. You will also need to decompress the data before using it again using the "expand" stage.