Multi valued Column in Hash File

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Someswar Barkataki
Participant
Posts: 6
Joined: Tue Jul 13, 2004 7:40 am

Multi valued Column in Hash File

Post by Someswar Barkataki »

Hi,

Can anyone suggest me how to create a multi-valued colmn in a hash file ?
Column should display all the values separated by comma for a particular Key.

for example : input file with three columns c1,c2 & c3
c1-- c2 -- c3
NE-- 20-- 1
EX-- 30 -- 2
NE-- 40-- 3
EX-- 45-- 4
EX-- 50-- 5

And Hash file should show the result as: (c1 is my Key)
c1 -- c2 --------------c3
NE -- 20, 40 -------1, 3
EX -- 30 , 45, 50 --2,4,5

Thanks in advance,

Somesh
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Hash file writes are destructive, so having said that I have approached your issue in a variety of ways. One method I have employed was to use the same type of logic used for SCDs with a twist.

You would use your hash file as both a target and reference. Create stage variables to hold the contents of the columns (EX and NE). Perform a lookup on each incoming row of data and use logic within the stage variables to append the incoming EX and NE to an already existing value or to set the value of the stage variable to the incoming NE and EX if no lookup is found.

This still performs destructive writes, but each write will be building the record you want.

Another way to solve this without a reference lookup would be to utilize a DS routine in the transform that expects a key and whatever fields you are building and returns a string(s) suitable for writing to the hash as value (253) delimited strings.

Good luck

Regards,
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Welcome aboard! :D

How you create multi-valued columns depends on how you create the hashed file.
If you're using a Hashed File or UV stage to create the hashed file, specify M in the Type column in the Columns grid.
If you're using a CREATE TABLE statement, you specify that the column is MULTIVALUED within its column definition.
If you're using a CREATE.FILE or mkdbfile command, you specify when building the file dictionary.

Populate the hashed file normalized on the multi-valued column.
Last edited by ray.wurlod on Tue Aug 31, 2004 9:35 pm, edited 1 time in total.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I don't think the OP meant "multi-valued" in the Universe sense, did they? Seems like they were just looking for a column where the values are seperated by commas, which seems like simple concatenation to me. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

My original post was more slanted to their post of comma separated values, hence the reply that I gave.

Regards,
anupam
Participant
Posts: 172
Joined: Fri Apr 04, 2003 10:51 pm
Location: India

Post by anupam »

If you're using a Hashed File or UV stage to create the hashed file, specify M in the Type column in the Columns grid.
I want to create a hash file stage for multivalued columns,but not able to see the option available for specifying the "M" or MULTIVALUED ....

in hash file stage, there is no Type column. There is only one SQL Type Column Grid available for specifying the type of the column.

Please suggest how to create a multi-valued hash file using Hash File Stage in Datastage Version 7X
----------------
Rgds,
Anupam
----------------
The future is not something we enter. The future is something we create.
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Anupam,

I do not believe DS has the ability (natively) to write multi-valued fields to a UV file - that's why I gave you my suggestion. I believe that in order to do this DS would have to read and write every record (output) so that a destructive write does not take place.

If the ability is there I have never used it nor am I aware of it.

Regards,
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Michael

It can be done. I just wrote a job which takes a source. It writes to a hash file. It does a lookup on the same hash file. If the lookup was there then the output column was lookup.col1 :@VM: In.col1. You can aggregate to multivalues to a hash file.

Code: Select all


        JobsByCatHash
             | 
             v
UvStage -> Trans -> JobsByCatHash

UvStage
SELECT DS_JOBS.NAME, DS_JOBS.CATEGORY FROM DS_JOBS WHERE NAME NOT LIKE '\%' ORDER BY CATEGORY, NAME;

Trans
if IsNull(LkpEtlJobCatHash.JOB_NAME) then DsJobLinks.JOB_NAME else LkpEtlJobCatHash.JOB_NAME:@VM:DsJobLinks.JOB_NAME


Try it. It works great. This would be awesome on dates.

Now you need to create new dictionary items to associate the multivalues on the hash file just created.
Mamu Kim
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

That's what I've been doing, too. Works great!

Tony
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

Well, Tony they all said it did not work. You could of said something sooner.
Mamu Kim
tonystark622
Premium Member
Premium Member
Posts: 483
Joined: Thu Jun 12, 2003 4:47 pm
Location: St. Louis, Missouri USA

Post by tonystark622 »

Sorry Kim,

I figured that you folks are way smarter than me about this and I must have not understood the whole situation. When you described what you were doing, building the multi-value field in a hash, it just clicked with me.

Tony
anupam
Participant
Posts: 172
Joined: Fri Apr 04, 2003 10:51 pm
Location: India

Post by anupam »

Code: Select all

     JobsByCatHash 
             | 
             v 
UvStage -> Trans -> JobsByCatHash 
It can be done using the method suggested by Kim and it is an acceptable way also.

Code: Select all

If you're using a Hashed File or UV stage to create the hashed file, specify M in the Type column in the Columns grid. 
I was not able to see the TYPE column in the Columns Grid to specify the possible multivalued column in the hash file stage so i was just curious to know how is it possible to specify it in the Hash file stage.
----------------
Rgds,
Anupam
----------------
The future is not something we enter. The future is something we create.
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

That is the trick. In the hash file you do not treat this field as multivalued. You do so only in a UV stage. This allows one job to create the multivalues and another job to use them.

Most of the time I would only use this in a lookup. I would write a routine to do it. If you look at Ken Bland's SCD type 2 routine to locate a value by date. Then you no longer need the rollup routine. You can do that in a job, Also the convert '|' to @VM goes away in the other routine. The @VM are already there. Look at his routine, this is what I would create to use this new hash file.
Mamu Kim
ryoung011
Participant
Posts: 10
Joined: Mon May 03, 2004 2:02 pm
Location: Atlanta, GA
Contact:

Post by ryoung011 »

Thanks for all of the great info. We are just staring the construction stage of an Enterprise DW and these posts are invaluable.

I am testing this as a solution for Type II Slowly Changing Dimension surrogate key lookups (in conjunction with Mr. Bland's routine) and have one question...

Is there a more efficient way of doing this? My test seems to be quite slow. I have the lookup hash configured with caching "Enabled, Lock For Updates".

My dimension has about 3M distinct natural key values, with about 9M records overall (average of three Type II changes per entity).

I am getting about 180 rows/sec. As you can imagine, this would take several hours to complete for an initial load.

One option I've researched is creating the multivalue strings in the database. I created a table in the dimension table's database using PL/SQL that accomplishes the same result and is MUCH faster (3000 rows/sec) which I subsequently dumped to a hash file for lookup (using the routine Mr. Bland supplied).

I'd like to keep it all in DataStage if possible.

Any ideas or configuration suggestions to speed this up?

Thanks!
kduke
Charter Member
Charter Member
Posts: 5227
Joined: Thu May 29, 2003 9:47 am
Location: Dallas, TX
Contact:

Post by kduke »

I think it is valid to do some of this in PLSQL if that speeds things up. I think doing it in a BEFORE SQL may make it look like it is all done in DataStage. I think most of us have used temp tables created with SQL to get around some performance issue. Maybe even Ray has used this trick.

I do not like to use aggregator stages. I usually do this in SQL or one of those sorting tools like CoSort. Sometimes going around DataStage is worth it in performance. I try to limit this type of thing.
Mamu Kim
Post Reply