location of 'datasets'

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
PeteM2
Premium Member
Premium Member
Posts: 44
Joined: Thu Dec 15, 2011 9:17 am
Location: uk

location of 'datasets'

Post by PeteM2 »

Does having the dataset descriptor file is on a different file sytem than the data files cause any problems?

The reason i am asking this is that the descriptor and data files are currently on the same file system and in order to reduce I/O queuing i would like to keep the descriptor file for new datasets on the current file system and move the data files onto a number of different file systems.
thanks
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

It should not cause any problems as long as all the file systems involved are available when needed. The *.apt config file(s) are flexible in allowing you to point to various paths across nodes.
Choose a job you love, and you will never have to work a day in your life. - Confucius
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

No, that scenario is quite common, especially in shared cluster/grid environments.

However, do not just simply move existing data segment files to new storage. You will need to either copy the existing datasets using a datastage job or the orchadmin command or simply recreate the datasets.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
PeteM2
Premium Member
Premium Member
Posts: 44
Joined: Thu Dec 15, 2011 9:17 am
Location: uk

Post by PeteM2 »

Does the config file node 'dataset' file system only relate to where new data files should be located?

Therefore is it the case that the location of existing data files would be derived from the decriptor file without any reference to the config file?
thanks
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

Dataset descriptor files contain the location of all data segment files which contain the data for that dataset. This is why you cannot simply move the data segments to another location. No, you cannot modify the dataset descriptor file...it is a proprietary binary format.

Yes, the disk resource entries in the configuration file determine where data segments files will be written per logical node, as documented in the Parallel Job Developer's Guide and mentioned many times in the forum.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
qt_ky
Premium Member
Premium Member
Posts: 2895
Joined: Wed Aug 03, 2011 6:16 am
Location: USA

Post by qt_ky »

Data set descriptor files (or control files) contain a copy of the config file that was used at the time of data set creation.
Choose a job you love, and you will never have to work a day in your life. - Confucius
Post Reply