Page 1 of 2

Delete specific records from hashfile / hashed-file

Posted: Tue Apr 01, 2008 3:22 am
by kausmone
Hi there,

I have a flat-file A and a hashed file B. Records in A are a subset of records in B. I need to delete those records in the hashfile B that are present in flat-file A. Can anyone help me on how to go about it in a server job/routine?

Thanks,
Kaus

Posted: Tue Apr 01, 2008 3:45 am
by ray.wurlod
Use a UV stage with user-defined SQL containing a DELETE statement such as

Code: Select all

DELETE FROM tablename WHERE key = ?
Provide the key value from your text file.

If the hashed file has been created in a directory you will need a VOC pointer to it. Search the forum for the SETFILE command.

Posted: Tue Apr 01, 2008 6:06 am
by kausmone
Did you mean the UniVerse stage? I tried using the stage but am not sure what is to be filled in the Data Source Name etc. I tried filling in the name of the hashed-file (images_metadata_hashed) but it didn't work. I did a view data by putting in a SELECT SQL and got the following error:

Code: Select all

JB_SVTT_Delete_HashedFile_UV..UniVerse_1: DSD.BCIConnect connecting to images_metadata_hashed, call to SQLConnect failed.
SQLSTATE=IM002, DBMS.CODE=0
[DataStage][SQL Client][ODBC][DataDirect][ODBC lib] Data source name not found and no default driver specified

Posted: Tue Apr 01, 2008 6:09 am
by ray.wurlod
The data source name needs to be localuv which is pre-defined in your uvodbc.config file. User name and password will be disabled, because localuv is a pre-established connection to the DataStage project itself.

Posted: Tue Apr 01, 2008 6:24 am
by kausmone
Thanks Ray. I tried with localuv after confirming that it is defined in the uvodbc.config file. I gave the hashed-file name in the Table Name field of the Outputs-> General tab. Now I'm getting error "table does not exist"

Code: Select all

JB_SVTT_Delete_HashedFile_UV..UniVerse_1.DSLink2: DSD.BCIOpenR call to SQLExecDirect failed.
Statement was:SELECT "ship_code", "doc_key", "piece_id", "event_key", "data_source_type_id", "coda_timestamp", "etl_timestamp", "pcdoc_sent_timestamp" FROM "images_metadata_hashed" 
SQLSTATE=S0002, DBMS.CODE=950390
[DataStage][SQL Client][UNIVERSE]DataStage/SQL: Table "images_metadata_hashed" does not exist.

Posted: Tue Apr 01, 2008 6:42 am
by chulett
Your hashed file must be pathed, i.e. created in a directory. In that case you'd need to create a VOC record pointer to it before you can use a UV stage over the hashed file.

Posted: Tue Apr 01, 2008 6:44 am
by kausmone
Ah, ok, gotcha. I tried creating the original hashed file by using account name and when I accessed this using UV stage (with a SELECT query), it worked without any problems.

Thanks for your help Ray and Craig!

-kaus

Posted: Tue Apr 01, 2008 6:47 am
by chulett
Another thought - why not logically delete the records? Rather than futz with all this, add a column and update it for the 'deleted' records. Check for the flag whenever you access the hashed file, treat it like a hashed miss when you find that it is set.

That will keep your account based hashed file out of your 'account' aka Project and put it back in the original directory without any extra shenanigans.

Posted: Tue Apr 01, 2008 7:13 am
by kausmone
I had to rule that out because the size of my hashed-file would keep on increasing with logical deletes. My process might encounter about a million records a day.

Now, when I try deleting, I am giving the query as below (user-defined) but it is not working. What is the correct way of specifying the input column? I am getting event_key and doc_key from my flat-file

Code: Select all

DELETE FROM "images_metadata_hashed" WHERE "event_key" = :event_key AND "doc_key" = :doc_key;

Posted: Tue Apr 01, 2008 7:24 am
by chulett
First generate sql to get the syntax right, then tweak from there. For example 'Replace existing rows completely' includes a delete. I'll let others chime in on the space issue but I don't believe deletes like this will actually get you any space back. :?

Not sure because I've never had a need to do this, I'll rebuild a hashed file before I'd ever try deleting anything from it.

Posted: Tue Apr 01, 2008 8:06 am
by kausmone
Thanks for the hint :)

For the record, the query needed to be:

Code: Select all

DELETE FROM "images_metadata_hashed" WHERE ("doc_key" = ? AND "event_key" = ?);
Can someone throw more light on whether or not I will be able to reclaim space this way?

Posted: Tue Apr 01, 2008 8:47 am
by DSguru2B
kausmone wrote: Can someone throw more light on whether or not I will be able to reclaim space this way?
Why dont you check it out yourself by running 'ANALYZE.FILE images_metadata_hashed' command on your hashed file before and after your delete and check out the size.
This command can be run from the administrator.

Posted: Tue Apr 01, 2008 8:50 am
by kausmone
Cool, thanks. Will do

Posted: Tue Apr 01, 2008 4:56 pm
by ray.wurlod
You do not reclaim space through DELETE operations. Free space is preserved in hashed files against the likelihood that it will be needed again in the future.

The only command that can reclaim space (apart from deleting and re-creating the hashed file) is RESIZE.

Posted: Wed Apr 02, 2008 1:34 am
by kausmone
If the freed space is going to be reclaimed by the hashed-file, it still serves the purpose, since my hashed-file is going to get appended to every hour or so. I think I should stick to this approach of deleting records 'physically' instead of logical deletes, where there will be no reclaimable space?