Remove DataSets
Moderators: chulett, rschirm, roy
Remove DataSets
I am trying to free up disk space in the file system setup for resource
File system is say ../dstage_DataSets
Via Data Set Management utility I have checked the ../dstage_DataSetssrc and ../dstage_DataSetstgt directors that I use in the programs, the programs are designed to trunctate the Data Set files after each run, these date sets are fine, I have delete a few test data sets in the folders that were not truncates (and no longer required)
When I go are a level to ./dstage_DataSets there are a large number of Data Sets with name like
XXX_ABC_EFG.dsadm.[servername].0000.0000.dia....
Example_SPJOb.dsadm.[servername].0000.0000.dia....
countH9HiWk
countKSeh7F
failed.ds.dsadm.[servername].0000.0000.0000.adb....
When I double click on some of these files I get the message
'Cannot read '/..../dstage_DataSets/Example_SPJOb.dsadm.[servername].0000.0000.dia....'
This is not a valid dataset file, or its format is not currently supported'
Can these DataSets be safely deleted.
what is the best advice to free up some disk space used for DataSets
Thanking you in advance
File system is say ../dstage_DataSets
Via Data Set Management utility I have checked the ../dstage_DataSetssrc and ../dstage_DataSetstgt directors that I use in the programs, the programs are designed to trunctate the Data Set files after each run, these date sets are fine, I have delete a few test data sets in the folders that were not truncates (and no longer required)
When I go are a level to ./dstage_DataSets there are a large number of Data Sets with name like
XXX_ABC_EFG.dsadm.[servername].0000.0000.dia....
Example_SPJOb.dsadm.[servername].0000.0000.dia....
countH9HiWk
countKSeh7F
failed.ds.dsadm.[servername].0000.0000.0000.adb....
When I double click on some of these files I get the message
'Cannot read '/..../dstage_DataSets/Example_SPJOb.dsadm.[servername].0000.0000.dia....'
This is not a valid dataset file, or its format is not currently supported'
Can these DataSets be safely deleted.
what is the best advice to free up some disk space used for DataSets
Thanking you in advance
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
You don't seem to understand the structure of Data Sets, Ross.
The descriptor file (the file whose name ends in ".ds") contains three things:
- the record schema for the data
- the configuration file that was used when the data were written
- pathnames of the "segment files" in which the data are actually stored
The Data Set Management tool in Designer will let you see this more clearly; when you click on any node you will see, in the lower grid, the segment file(s) associated with that node. They have very long names, including the descriptor file name, the user who created them, and some counters.
The best way to delete a Data Set is either through the Data Set Management utility or from the command line. The command you need is orchadmin rm but, before you use that, you need to specify values for APT_CONFIG_FILE and APT_ORCHHOME in your shell. Specify the pathname of the descriptor file as the object that you want to delete.
For any segment file, ascertain its descriptor file name from the segment file name, and find that descriptor file on the file system (use find command) then use the correct method for deleting the Data Set.
It is NOT safe just to delete the segment files. You will leave orphaned descriptor files, and probably orphaned segment files relating to other nodes, particularly if more than one file system is used for Data Set (segment file) storage.
It may be safe to delete segment files if you can positively ascertain that the corresponding descriptor file no longer exists. And only then.
The descriptor file (the file whose name ends in ".ds") contains three things:
- the record schema for the data
- the configuration file that was used when the data were written
- pathnames of the "segment files" in which the data are actually stored
The Data Set Management tool in Designer will let you see this more clearly; when you click on any node you will see, in the lower grid, the segment file(s) associated with that node. They have very long names, including the descriptor file name, the user who created them, and some counters.
The best way to delete a Data Set is either through the Data Set Management utility or from the command line. The command you need is orchadmin rm but, before you use that, you need to specify values for APT_CONFIG_FILE and APT_ORCHHOME in your shell. Specify the pathname of the descriptor file as the object that you want to delete.
For any segment file, ascertain its descriptor file name from the segment file name, and find that descriptor file on the file system (use find command) then use the correct method for deleting the Data Set.
It is NOT safe just to delete the segment files. You will leave orphaned descriptor files, and probably orphaned segment files relating to other nodes, particularly if more than one file system is used for Data Set (segment file) storage.
It may be safe to delete segment files if you can positively ascertain that the corresponding descriptor file no longer exists. And only then.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Ray,
I have followed you advice.
For example
find . -name "ABC_FailedData.ds*" -print
./dstage_DataSets/ABC_FailedData.ds.dsadm.host......163fa824
./dstage_DataSets/ABC_FailedData.ds.dsadm.host......6e468fb8
./dstage_DataSets/ABC_FailedData.ds.dsadm.host......09220e5e
./dstage_DataSets/ABC_FailedData.ds.dsadm.host......42b9c47e
Then via orchadmin
. $DSHOME/dsenv;$APT_ORCHHOME/bin/orchadmin dump /xxx/dddd/dstage_DataSets/ABC_FailedData
I get
WARNING: Could not dump /xxx/dddd/dstage_DataSets/ABC_FailedData because it does not exist
What are the next steps should I consider working toward safely deleting the files to recover space
I have a number of files like this
Thanking you all in advance
I have followed you advice.
For example
find . -name "ABC_FailedData.ds*" -print
./dstage_DataSets/ABC_FailedData.ds.dsadm.host......163fa824
./dstage_DataSets/ABC_FailedData.ds.dsadm.host......6e468fb8
./dstage_DataSets/ABC_FailedData.ds.dsadm.host......09220e5e
./dstage_DataSets/ABC_FailedData.ds.dsadm.host......42b9c47e
Then via orchadmin
. $DSHOME/dsenv;$APT_ORCHHOME/bin/orchadmin dump /xxx/dddd/dstage_DataSets/ABC_FailedData
I get
WARNING: Could not dump /xxx/dddd/dstage_DataSets/ABC_FailedData because it does not exist
What are the next steps should I consider working toward safely deleting the files to recover space
I have a number of files like this
Thanking you all in advance
Code: Select all
. $DSHOME/dsenv;$APT_ORCHHOME/bin/orchadmin dump /xxx/dddd/dstage_DataSets/ABC_FailedData.ds
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
It looks like you've assumed that the dataset descriptor file is in the same location as the dataset segment files. That could be a bad assumption.
Find the location of the descriptor file (ending in dot ds).
Mike
Code: Select all
find / -name "ABC_FailedData.ds" -print 2>/dev/null
Mike
FWIW our team decided to create a routine that can delete datasets. We put it at the end of all our sequence jobs and name our datasets in a consistent manner that allows us to blanket delete them safely at the end of the jobs. This is working well for us. We also put in a bypass to leave the datasets when working in development/debugging mode.
Such a routine can also be poked into a one or two stage sequence and used to clean up strays or used for a hands-on approach to cleanup.
Such a routine can also be poked into a one or two stage sequence and used to clean up strays or used for a hands-on approach to cleanup.
As an admin, I create hidden directories for the data segments. That way it hides the noise from the developers since they should only care about the descriptor file. If someone cares enough to look for the segments then they should be smart enough to track down the path via the APT file and find them.
-
- Participant
- Posts: 54607
- Joined: Wed Oct 23, 2002 10:52 pm
- Location: Sydney, Australia
- Contact:
Every project I create has a sub-directory created in which its Data Set, File Set and Lookup File Set descriptor files are placed. No problem finding them when the need arises.
I don't go to the extreme of making the resource disk hidden directories, but can understand Paul's motivation. Most of my users are not "UNIX literate" and therefore are not given access to UNIX.
I don't go to the extreme of making the resource disk hidden directories, but can understand Paul's motivation. Most of my users are not "UNIX literate" and therefore are not given access to UNIX.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
I will see if I can post the whole thing (have to find out if allowed to do so)...dls wrote:Are you willing to share your routine?
all it does is go to the path where the file pieces are stored and unix remove the pieces along with the visible .ds file, so its just a fancy rm command with the correct folders and file names cooked up. I didn't write it, but I think it probably parses the root dataset file for the file names -- I know they are contained in it.
After much running of the Unix find command, I managed to find a couple of the rouge Datasets descriptors and delete them using the DataSet management tool.
I am still left with a number of very large files consuming 44% for the disk space assigned for DataSets for which I am unable to find descriptor files, so they are yet to be deleted. All current jobs/job sequences are designed to self manage their DataSets either by delete or truncate DataSet management.
Why the historical files exist is varied including inexperienced developers and development standards not being followed for what ever reason they exist. What I want to do is recover as much disk space as possible ASAP.
Any suggestions to removing these outstanding files.
Thanking you all in advance.
I am still left with a number of very large files consuming 44% for the disk space assigned for DataSets for which I am unable to find descriptor files, so they are yet to be deleted. All current jobs/job sequences are designed to self manage their DataSets either by delete or truncate DataSet management.
Why the historical files exist is varied including inexperienced developers and development standards not being followed for what ever reason they exist. What I want to do is recover as much disk space as possible ASAP.
Any suggestions to removing these outstanding files.
Thanking you all in advance.
If they do not have a dataset descriptor file, then one of your inexperienced developers probably deleted the descriptor file improperly with a rm command.
You can do the same for the orphaned dataset segment files.
It sounds like you've done your due diligence in attempting to find the descriptor files.
Mike
You can do the same for the orphaned dataset segment files.
It sounds like you've done your due diligence in attempting to find the descriptor files.
Mike