Remove DataSets

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

ecclesr
Premium Member
Premium Member
Posts: 260
Joined: Sat Apr 05, 2003 7:12 pm
Location: Australia

Remove DataSets

Post by ecclesr »

I am trying to free up disk space in the file system setup for resource

File system is say ../dstage_DataSets

Via Data Set Management utility I have checked the ../dstage_DataSetssrc and ../dstage_DataSetstgt directors that I use in the programs, the programs are designed to trunctate the Data Set files after each run, these date sets are fine, I have delete a few test data sets in the folders that were not truncates (and no longer required)

When I go are a level to ./dstage_DataSets there are a large number of Data Sets with name like

XXX_ABC_EFG.dsadm.[servername].0000.0000.dia....
Example_SPJOb.dsadm.[servername].0000.0000.dia....
countH9HiWk
countKSeh7F
failed.ds.dsadm.[servername].0000.0000.0000.adb....

When I double click on some of these files I get the message
'Cannot read '/..../dstage_DataSets/Example_SPJOb.dsadm.[servername].0000.0000.dia....'

This is not a valid dataset file, or its format is not currently supported'


Can these DataSets be safely deleted.

what is the best advice to free up some disk space used for DataSets

Thanking you in advance
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

You don't seem to understand the structure of Data Sets, Ross.

The descriptor file (the file whose name ends in ".ds") contains three things:
- the record schema for the data
- the configuration file that was used when the data were written
- pathnames of the "segment files" in which the data are actually stored

The Data Set Management tool in Designer will let you see this more clearly; when you click on any node you will see, in the lower grid, the segment file(s) associated with that node. They have very long names, including the descriptor file name, the user who created them, and some counters.

The best way to delete a Data Set is either through the Data Set Management utility or from the command line. The command you need is orchadmin rm but, before you use that, you need to specify values for APT_CONFIG_FILE and APT_ORCHHOME in your shell. Specify the pathname of the descriptor file as the object that you want to delete.

For any segment file, ascertain its descriptor file name from the segment file name, and find that descriptor file on the file system (use find command) then use the correct method for deleting the Data Set.

It is NOT safe just to delete the segment files. You will leave orphaned descriptor files, and probably orphaned segment files relating to other nodes, particularly if more than one file system is used for Data Set (segment file) storage.

It may be safe to delete segment files if you can positively ascertain that the corresponding descriptor file no longer exists. And only then.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ecclesr
Premium Member
Premium Member
Posts: 260
Joined: Sat Apr 05, 2003 7:12 pm
Location: Australia

Post by ecclesr »

Ray,

I have followed you advice.

For example

find . -name "ABC_FailedData.ds*" -print

./dstage_DataSets/ABC_FailedData.ds.dsadm.host......163fa824
./dstage_DataSets/ABC_FailedData.ds.dsadm.host......6e468fb8
./dstage_DataSets/ABC_FailedData.ds.dsadm.host......09220e5e
./dstage_DataSets/ABC_FailedData.ds.dsadm.host......42b9c47e

Then via orchadmin

. $DSHOME/dsenv;$APT_ORCHHOME/bin/orchadmin dump /xxx/dddd/dstage_DataSets/ABC_FailedData

I get

WARNING: Could not dump /xxx/dddd/dstage_DataSets/ABC_FailedData because it does not exist

What are the next steps should I consider working toward safely deleting the files to recover space

I have a number of files like this

Thanking you all in advance
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Code: Select all

. $DSHOME/dsenv;$APT_ORCHHOME/bin/orchadmin dump /xxx/dddd/dstage_DataSets/ABC_FailedData.ds
(you neglected to add the ".ds" to the filepath)
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

What Arnd said.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
ecclesr
Premium Member
Premium Member
Posts: 260
Joined: Sat Apr 05, 2003 7:12 pm
Location: Australia

Post by ecclesr »

I only made the typo in my posting as I already knew not to include the suffix

. $DSHOME/dsenv;$APT_ORCHHOME/bin/orchadmin dump /xxx/dddd/ABC_FailedData

WARNING: Could not dump /xxx/dddd/ABC_FailedData because it does not exist
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

It looks like you've assumed that the dataset descriptor file is in the same location as the dataset segment files. That could be a bad assumption.

Code: Select all

find / -name "ABC_FailedData.ds" -print 2>/dev/null
Find the location of the descriptor file (ending in dot ds).

Mike
ecclesr
Premium Member
Premium Member
Posts: 260
Joined: Sat Apr 05, 2003 7:12 pm
Location: Australia

Post by ecclesr »

Thanks Mike, I was finally able to find the location of the descriptor file and delete using the Data Set Management Utility
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

FWIW our team decided to create a routine that can delete datasets. We put it at the end of all our sequence jobs and name our datasets in a consistent manner that allows us to blanket delete them safely at the end of the jobs. This is working well for us. We also put in a bypass to leave the datasets when working in development/debugging mode.

Such a routine can also be poked into a one or two stage sequence and used to clean up strays or used for a hands-on approach to cleanup.
dls
Premium Member
Premium Member
Posts: 96
Joined: Tue Sep 09, 2003 5:15 pm

Post by dls »

Are you willing to share your routine?
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

As an admin, I create hidden directories for the data segments. That way it hides the noise from the developers since they should only care about the descriptor file. If someone cares enough to look for the segments then they should be smart enough to track down the path via the APT file and find them.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Every project I create has a sub-directory created in which its Data Set, File Set and Lookup File Set descriptor files are placed. No problem finding them when the need arises.

I don't go to the extreme of making the resource disk hidden directories, but can understand Paul's motivation. Most of my users are not "UNIX literate" and therefore are not given access to UNIX.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
UCDI
Premium Member
Premium Member
Posts: 383
Joined: Mon Mar 21, 2016 2:00 pm

Post by UCDI »

dls wrote:Are you willing to share your routine?
I will see if I can post the whole thing (have to find out if allowed to do so)...

all it does is go to the path where the file pieces are stored and unix remove the pieces along with the visible .ds file, so its just a fancy rm command with the correct folders and file names cooked up. I didn't write it, but I think it probably parses the root dataset file for the file names -- I know they are contained in it.
ecclesr
Premium Member
Premium Member
Posts: 260
Joined: Sat Apr 05, 2003 7:12 pm
Location: Australia

Post by ecclesr »

After much running of the Unix find command, I managed to find a couple of the rouge Datasets descriptors and delete them using the DataSet management tool.

I am still left with a number of very large files consuming 44% for the disk space assigned for DataSets for which I am unable to find descriptor files, so they are yet to be deleted. All current jobs/job sequences are designed to self manage their DataSets either by delete or truncate DataSet management.

Why the historical files exist is varied including inexperienced developers and development standards not being followed for what ever reason they exist. What I want to do is recover as much disk space as possible ASAP.

Any suggestions to removing these outstanding files.

Thanking you all in advance.
Mike
Premium Member
Premium Member
Posts: 1021
Joined: Sun Mar 03, 2002 6:01 pm
Location: Tampa, FL

Post by Mike »

If they do not have a dataset descriptor file, then one of your inexperienced developers probably deleted the descriptor file improperly with a rm command.

You can do the same for the orphaned dataset segment files.

It sounds like you've done your due diligence in attempting to find the descriptor files.

Mike
Post Reply