Why use orchadmin to delete datasets?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
chula
Participant
Posts: 2
Joined: Tue Apr 16, 2013 11:12 am

Why use orchadmin to delete datasets?

Post by chula »

There are numerous discussions about using orchadmin to delete datasets, but I am curious why it's so important to use this tool rather than a clever find -mtime -exec rm {} command running thru the *.ds files and all of the dataset folders under each node.

What is orchadmin doing for me that I cannot do myself from the file system? Is it updating the XMETA repository in some way? Is Datastage maintaining information about these datasets in some sort of repository somewhere other than the descriptor file itself?

I'm looking for consequences of taking on the deletion myself and not dealing with the slow and occassionally unreliable orchadmin tool other than the possibility that I orphan a descriptor file from one or more of its supporting data files.
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

The normal rm command will only remove the .ds physical files, and it will not remove the descriptor files places in other locations which has the meta data, Ideally if you delete both these files manually then you dont need orchadmin utility at all, it's all about convinience, if you use orchadmin it deltes the files automatically on both the sides.
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

Adding more to that.....

Dataset is multiple files. They are
a) Descriptor File
b) Data File
c) Control file
d) Header Files
chula
Participant
Posts: 2
Joined: Tue Apr 16, 2013 11:12 am

Post by chula »

I'm aware of the descriptor file (.ds) and the data files under the node# folders. Where would you find the control and header files and how are they named.

Thanks
Nagaraj
Premium Member
Premium Member
Posts: 383
Joined: Thu Nov 08, 2007 12:32 am
Location: Bangalore

Post by Nagaraj »

Okay if you write to a file path like /a/b/datsetname.ds

the descriptor files usually exists on

/opt/IBM/InformationServer/Server/Datasets/datsetname.ds.userid.hostname..0000.0002.0000.1334.d1ec1d4e.0002.ef79b53e

you will have to delete the file on the Datasets directory too to get rid of the dataset completely on the server.

Cheers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Open the Data Set Management tool in DataStage Designer.

Open any of your valid Data Sets.

The tool will show you the contents of the descriptor file (the *.ds file). It also shows the location and names of the segment files on each node (as you select each node) that contain the data.

Nagaraj has it wrong. A Data Set consists only of the descriptor file and its associated segment (data) files. There are no "control" files (this term is sometimes used to refer to the descriptor file). There are no separate header files. Each segment (data) file has its own internal header.

Using rm to delete the *.ds files leaves all the Data Set segment files orphaned, and also destroys the "map" that contains their locations.

And THAT is why you should use orchadmin rm to delete Data Sets. You can also delete them from within the Data Set Management tool, but that uses orchadmin rm under the covers.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
PaulVL
Premium Member
Premium Member
Posts: 1315
Joined: Fri Dec 17, 2010 4:36 pm

Post by PaulVL »

An orchadmin RM command also does a flush of your cache unless you have a certain environment variable set (can't remember off the top of my head).

Some folks would want that user id to flush it's cache upon the request to do the orchadmin rm. Some would not want that because if you script a bunch of "orchadmin rm file*" commands, you could have a cache flushing nightmare on your hand and actually soak up your IO bandwidth writing out to disk.


But yes, you can craft your own method to delete your descriptor file and the data segment files. Just ensure that you delete the correct coresponding segment files. A mass delete of everything older than day X is not the same as crafting your own "orchadmin rm" command.
jwiles
Premium Member
Premium Member
Posts: 1274
Joined: Sun Nov 14, 2004 8:50 pm
Contact:

Post by jwiles »

To determine the appropriate dataset segments to delete, you would need to pull that info from the descriptor file which is a binary file, not text, with a non-published format that is subject to change.

While you may not trust the orchadmin command 100%, I expect that rolling your own would be much more problematic.

Regards,
- james wiles


All generalizations are false, including this one - Mark Twain.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Nagaraj wrote:Dataset is multiple files. They are
a) Descriptor File
b) Data File
c) Control file
d) Header Files
I've seen this posted in another DataStage forum and it confused me there and here... four files? I was fairly certain it was as Ray described but it's good to see it spelled out explicitly. Tanks. :wink:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

ray.wurlod wrote:Nagaraj has it wrong. A Data Set consists only of the descriptor file and its associated segment (data) files. There are no "control" files (this term is sometimes used to refer to the descriptor file). There are no separate header files. Each segment (data) file has its own internal header.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Hoping you didn't quote that for me... was trying to say that I appreciated the fact that you had already done that in your original post.
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply