Dataset Compression

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
Abee
Participant
Posts: 2
Joined: Thu Apr 02, 2009 2:21 am
Location: Singpaore

Dataset Compression

Post by Abee »

Hi all,

Is it possible to compress the datasets in Datastage 7X. Also kindly help me to understand the advantages and disadvantages of compressing the datasets .

My Questions Extends further below .

Once the DS is compressed can we read the DS without uncompressing it ?
Is it possible to overwrite the compressed DS ?
Is it possible to create the dataset in compressed mode rather than creating it and compress it.
Will there be any space saving when we compress? Also do we have any compression ratio available.
Will the perfomance be affected when we compress a dataset ?

Thanks in Advance
Regards Abee.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

You can compress datasets.

But before that can you provide more info on what you are trying to achieve.
Abee
Participant
Posts: 2
Joined: Thu Apr 02, 2009 2:21 am
Location: Singpaore

Post by Abee »

In a DWH system, we have more than 300 raw files for loading and many star models and interims . We are trying to reduce the space occupied by the Datasets which are created , since we are having some space crunch . Likewise in Oracle if we can able to compress the data and access them as well without hindering the perfomance , we would like to use them . Also we would like to know the compression ratio , since identify the optimal solution .
Regards Abee.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

You can compress using normal OS commands.

Compression ratio depends on the content - just like the text files.

Before going down that route, did you try clearing unwanted and "expired" datasets.
Scope
Premium Member
Premium Member
Posts: 63
Joined: Wed Jun 06, 2007 6:38 am
Location: Chennai

Post by Scope »

You cannot compress the dataset. Only the descriptor file will be created in directory where you trying to create the dataset. The data will be stored in the resource disk (path mentioned in configuration file) in internal format.
Kumarez
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You can compress a Data Set (you have to work out where all its data files are, of course) and doing so renders it unusable by DataStage. The gains are negligible because data are already in binary form within a Data Set.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Even though I wish to differ from Ray that Datasets can be compressed but agree that the benefits are not huge.
Sainath.Srinivasan
Participant
Posts: 3337
Joined: Mon Jan 17, 2005 4:49 am
Location: United Kingdom

Post by Sainath.Srinivasan »

Even though I wish to differ from Ray that Datasets can be compressed but agree that the benefits are not huge.

As mentioned before, you may gain by organising it better.
sima79
Premium Member
Premium Member
Posts: 38
Joined: Mon Jul 16, 2007 8:12 am
Location: Melbourne, Australia

Post by sima79 »

You can compress data using the "compress" stage without having to land the dataset to disk then compress. I suggest that the original poster create a couple of sample jobs, one with the compress stage and one without.

Code: Select all

e.g. source stage -> compress stage -> dataset stage
Run the job and find where the dataset persists its data on disk as defined in the configuration file or in the dataset descriptor file. Compare the sizes. I have managed to get some reasonable space saving (we are not talking huge) using the compress stage in particular using the g-zip setting. Note: nothing is free. You will be sacrificing performance for some space savings. You will also need to decompress the data before using it again using the "expand" stage.
Post Reply