Page 1 of 1

Why use orchadmin to delete datasets?

Posted: Tue Apr 16, 2013 11:15 am
by chula
There are numerous discussions about using orchadmin to delete datasets, but I am curious why it's so important to use this tool rather than a clever find -mtime -exec rm {} command running thru the *.ds files and all of the dataset folders under each node.

What is orchadmin doing for me that I cannot do myself from the file system? Is it updating the XMETA repository in some way? Is Datastage maintaining information about these datasets in some sort of repository somewhere other than the descriptor file itself?

I'm looking for consequences of taking on the deletion myself and not dealing with the slow and occassionally unreliable orchadmin tool other than the possibility that I orphan a descriptor file from one or more of its supporting data files.

Posted: Tue Apr 16, 2013 12:49 pm
by Nagaraj
The normal rm command will only remove the .ds physical files, and it will not remove the descriptor files places in other locations which has the meta data, Ideally if you delete both these files manually then you dont need orchadmin utility at all, it's all about convinience, if you use orchadmin it deltes the files automatically on both the sides.

Posted: Tue Apr 16, 2013 12:57 pm
by Nagaraj
Adding more to that.....

Dataset is multiple files. They are
a) Descriptor File
b) Data File
c) Control file
d) Header Files

Posted: Tue Apr 16, 2013 2:02 pm
by chula
I'm aware of the descriptor file (.ds) and the data files under the node# folders. Where would you find the control and header files and how are they named.

Thanks

Posted: Tue Apr 16, 2013 2:45 pm
by Nagaraj
Okay if you write to a file path like /a/b/datsetname.ds

the descriptor files usually exists on

/opt/IBM/InformationServer/Server/Datasets/datsetname.ds.userid.hostname..0000.0002.0000.1334.d1ec1d4e.0002.ef79b53e

you will have to delete the file on the Datasets directory too to get rid of the dataset completely on the server.

Cheers

Posted: Tue Apr 16, 2013 4:20 pm
by ray.wurlod
Open the Data Set Management tool in DataStage Designer.

Open any of your valid Data Sets.

The tool will show you the contents of the descriptor file (the *.ds file). It also shows the location and names of the segment files on each node (as you select each node) that contain the data.

Nagaraj has it wrong. A Data Set consists only of the descriptor file and its associated segment (data) files. There are no "control" files (this term is sometimes used to refer to the descriptor file). There are no separate header files. Each segment (data) file has its own internal header.

Using rm to delete the *.ds files leaves all the Data Set segment files orphaned, and also destroys the "map" that contains their locations.

And THAT is why you should use orchadmin rm to delete Data Sets. You can also delete them from within the Data Set Management tool, but that uses orchadmin rm under the covers.

Posted: Tue Apr 16, 2013 4:47 pm
by PaulVL
An orchadmin RM command also does a flush of your cache unless you have a certain environment variable set (can't remember off the top of my head).

Some folks would want that user id to flush it's cache upon the request to do the orchadmin rm. Some would not want that because if you script a bunch of "orchadmin rm file*" commands, you could have a cache flushing nightmare on your hand and actually soak up your IO bandwidth writing out to disk.


But yes, you can craft your own method to delete your descriptor file and the data segment files. Just ensure that you delete the correct coresponding segment files. A mass delete of everything older than day X is not the same as crafting your own "orchadmin rm" command.

Posted: Tue Apr 16, 2013 5:05 pm
by jwiles
To determine the appropriate dataset segments to delete, you would need to pull that info from the descriptor file which is a binary file, not text, with a non-published format that is subject to change.

While you may not trust the orchadmin command 100%, I expect that rolling your own would be much more problematic.

Regards,

Posted: Tue Apr 16, 2013 5:56 pm
by chulett
Nagaraj wrote:Dataset is multiple files. They are
a) Descriptor File
b) Data File
c) Control file
d) Header Files
I've seen this posted in another DataStage forum and it confused me there and here... four files? I was fairly certain it was as Ray described but it's good to see it spelled out explicitly. Tanks. :wink:

Posted: Tue Apr 16, 2013 6:26 pm
by ray.wurlod
ray.wurlod wrote:Nagaraj has it wrong. A Data Set consists only of the descriptor file and its associated segment (data) files. There are no "control" files (this term is sometimes used to refer to the descriptor file). There are no separate header files. Each segment (data) file has its own internal header.

Posted: Tue Apr 16, 2013 7:01 pm
by chulett
Hoping you didn't quote that for me... was trying to say that I appreciated the fact that you had already done that in your original post.