Remove DataSets

ecclesr · Post by **ecclesr** » Wed Aug 10, 2016 8:31 pm

I am trying to free up disk space in the file system setup for resource

File system is say ../dstage_DataSets

Via Data Set Management utility I have checked the ../dstage_DataSetssrc and ../dstage_DataSetstgt directors that I use in the programs, the programs are designed to trunctate the Data Set files after each run, these date sets are fine, I have delete a few test data sets in the folders that were not truncates (and no longer required)

When I go are a level to ./dstage_DataSets there are a large number of Data Sets with name like

XXX_ABC_EFG.dsadm.[servername].0000.0000.dia....
Example_SPJOb.dsadm.[servername].0000.0000.dia....
countH9HiWk
countKSeh7F
failed.ds.dsadm.[servername].0000.0000.0000.adb....

When I double click on some of these files I get the message
'Cannot read '/..../dstage_DataSets/Example_SPJOb.dsadm.[servername].0000.0000.dia....'

This is not a valid dataset file, or its format is not currently supported'

Can these DataSets be safely deleted.

what is the best advice to free up some disk space used for DataSets

Thanking you in advance

ray.wurlod · Post by **ray.wurlod** » Wed Aug 10, 2016 10:16 pm

You don't seem to understand the structure of Data Sets, Ross.

The descriptor file (the file whose name ends in ".ds") contains three things:
- the record schema for the data
- the configuration file that was used when the data were written
- pathnames of the "segment files" in which the data are actually stored

The Data Set Management tool in Designer will let you see this more clearly; when you click on any node you will see, in the lower grid, the segment file(s) associated with that node. They have very long names, including the descriptor file name, the user who created them, and some counters.

The best way to delete a Data Set is either through the Data Set Management utility or from the command line. The command you need is orchadmin rm but, before you use that, you need to specify values for APT_CONFIG_FILE and APT_ORCHHOME in your shell. Specify the pathname of the descriptor file as the object that you want to delete.

For any segment file, ascertain its descriptor file name from the segment file name, and find that descriptor file on the file system (use find command) then use the correct method for deleting the Data Set.

It is NOT safe just to delete the segment files. You will leave orphaned descriptor files, and probably orphaned segment files relating to other nodes, particularly if more than one file system is used for Data Set (segment file) storage.

It may be safe to delete segment files if you can positively ascertain that the corresponding descriptor file no longer exists. And only then.

ecclesr · Post by **ecclesr** » Mon Aug 15, 2016 2:15 am

Ray,

I have followed you advice.

For example

find . -name "ABC_FailedData.ds*" -print

./dstage_DataSets/ABC_FailedData.ds.dsadm.host......163fa824
./dstage_DataSets/ABC_FailedData.ds.dsadm.host......6e468fb8
./dstage_DataSets/ABC_FailedData.ds.dsadm.host......09220e5e
./dstage_DataSets/ABC_FailedData.ds.dsadm.host......42b9c47e

Then via orchadmin

. $DSHOME/dsenv;$APT_ORCHHOME/bin/orchadmin dump /xxx/dddd/dstage_DataSets/ABC_FailedData

I get

WARNING: Could not dump /xxx/dddd/dstage_DataSets/ABC_FailedData because it does not exist

What are the next steps should I consider working toward safely deleting the files to recover space

I have a number of files like this

Thanking you all in advance

ArndW · Post by **ArndW** » Mon Aug 15, 2016 4:26 am

Code: Select all

. $DSHOME/dsenv;$APT_ORCHHOME/bin/orchadmin dump /xxx/dddd/dstage_DataSets/ABC_FailedData.ds

(you neglected to add the ".ds" to the filepath)

ray.wurlod · Post by **ray.wurlod** » Mon Aug 15, 2016 3:20 pm

What Arnd said.

ecclesr · Post by **ecclesr** » Mon Aug 15, 2016 6:04 pm

I only made the typo in my posting as I already knew not to include the suffix

. $DSHOME/dsenv;$APT_ORCHHOME/bin/orchadmin dump /xxx/dddd/ABC_FailedData

WARNING: Could not dump /xxx/dddd/ABC_FailedData because it does not exist

Mike · Post by **Mike** » Mon Aug 15, 2016 8:49 pm

It looks like you've assumed that the dataset descriptor file is in the same location as the dataset segment files. That could be a bad assumption.

Code: Select all

find / -name "ABC_FailedData.ds" -print 2>/dev/null

Find the location of the descriptor file (ending in dot ds).

Mike

ecclesr · Post by **ecclesr** » Tue Aug 16, 2016 2:32 am

Thanks Mike, I was finally able to find the location of the descriptor file and delete using the Data Set Management Utility

UCDI · Post by **UCDI** » Tue Aug 16, 2016 11:25 am

FWIW our team decided to create a routine that can delete datasets. We put it at the end of all our sequence jobs and name our datasets in a consistent manner that allows us to blanket delete them safely at the end of the jobs. This is working well for us. We also put in a bypass to leave the datasets when working in development/debugging mode.

Such a routine can also be poked into a one or two stage sequence and used to clean up strays or used for a hands-on approach to cleanup.

dls · Post by **dls** » Wed Aug 17, 2016 1:30 pm

Are you willing to share your routine?

PaulVL · Post by **PaulVL** » Wed Aug 17, 2016 5:51 pm

As an admin, I create hidden directories for the data segments. That way it hides the noise from the developers since they should only care about the descriptor file. If someone cares enough to look for the segments then they should be smart enough to track down the path via the APT file and find them.

ray.wurlod · Post by **ray.wurlod** » Wed Aug 17, 2016 9:06 pm

Every project I create has a sub-directory created in which its Data Set, File Set and Lookup File Set descriptor files are placed. No problem finding them when the need arises.

I don't go to the extreme of making the resource disk hidden directories, but can understand Paul's motivation. Most of my users are not "UNIX literate" and therefore are not given access to UNIX.

UCDI · Post by **UCDI** » Thu Aug 18, 2016 10:39 am

dls wrote:Are you willing to share your routine?

I will see if I can post the whole thing (have to find out if allowed to do so)...

all it does is go to the path where the file pieces are stored and unix remove the pieces along with the visible .ds file, so its just a fancy rm command with the correct folders and file names cooked up. I didn't write it, but I think it probably parses the root dataset file for the file names -- I know they are contained in it.

ecclesr · Post by **ecclesr** » Fri Aug 19, 2016 12:37 am

After much running of the Unix find command, I managed to find a couple of the rouge Datasets descriptors and delete them using the DataSet management tool.

I am still left with a number of very large files consuming 44% for the disk space assigned for DataSets for which I am unable to find descriptor files, so they are yet to be deleted. All current jobs/job sequences are designed to self manage their DataSets either by delete or truncate DataSet management.

Why the historical files exist is varied including inexperienced developers and development standards not being followed for what ever reason they exist. What I want to do is recover as much disk space as possible ASAP.

Any suggestions to removing these outstanding files.

Thanking you all in advance.

Mike · Post by **Mike** » Fri Aug 19, 2016 7:35 am

If they do not have a dataset descriptor file, then one of your inexperienced developers probably deleted the descriptor file improperly with a rm command.

You can do the same for the orphaned dataset segment files.

It sounds like you've done your due diligence in attempting to find the descriptor files.

Mike