Ideas while handling Fact & Dimension Tables in Star Sch

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

Post Reply
asvictor
Participant
Posts: 31
Joined: Tue Sep 02, 2003 3:06 am
Location: Singapore
Contact:

Ideas while handling Fact & Dimension Tables in Star Sch

Post by asvictor »

Hi,

I will be designing ETL for Fact & Dimension tables. Is there a document that I can refer when I need to Design ETL Jobs to handle Fact & Dim tables?

Cheers
Victor Auxilium
1stpoint
Participant
Posts: 165
Joined: Thu Nov 13, 2003 2:10 pm
Contact:

Post by 1stpoint »

uh,
How about Kimball Data Warehouse Life cycle toolkit for starters..

There are many approaches to this problem, first you need to decide things like: slowly changing dimensions, Type 1 vs. Type 2 vs. Type 3 fact tables, the possibility of using factless facts, the complexity of your model, operational dependencies, etc etc etc.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I have a copy of an Ascential 'Tech Tip' document from the 'Tips and Tricks' compilation on how to handle SCDs in DataStage. Be glad to send that on to you if you like.

Or you can just wait for Ken to post something. :lol:
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Since you've got version 7, you might also like to investigate the job "templates" (generators - can't remember the exact terminology) supplied with it for handling slowly changing dimensions. They're not perfect, but can save some work.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Well, I waited long enough for others to kick in, so here I go:

Star schemas, what the the characteristics:

1. Surrogate keys
2. Star schema data model, not 3NF
3. Aggregate tables
4. Slowly changing dimensions
5. Hub and spoke warehouse architecture
6. Heavily batch-oriented

So, anyone designing a warehouse based on Kimball methodology has to model accordingly, and therefore write ETL to support loading that type of model. So, if I may be so bold, please read page 652 of said Data Warehouse Lifecycle Toolkit. It describes an ETL batch process whereby you will extract and stage all of your source data. You will then prepare your dimension table inserts/updates in your sandbox. You will then transform your facts, doing your surrogate foreign key substitution while in the sandbox. You will then incrementally compute the update to aggregates of your fact tables, again within your sandbox. If satisfied with results, meaning exception counts are within tolerances, you will extract from your sandbox your inserts and updates and prepare your load ready files. Finally, you will apply your changes to the target in order of dimensions, facts, then aggregates. Lastly, since you are following a hub and spoke architecture, you will then cascade downstream to your datamarts.

Because you have surrogate keys to assign, you will have to reference the target tables during ETL processing. This is because you have to determine if the row exists in the target table and then either use or assign a surrogate key. Because you most likely will be doing changed data detection to determine if your transformed row is different than the row already assigned a surrogate key in the warehouse, you will have to fully inspect using some mechanism whether the row is different. If different, you have to decide if you're doing a type 1, 2, or 3 slowly changing dimension. Read Kimball's Data Warehouse Toolkit page 100 for descriptions of the types. You may even do a combination of type 1 and 2, what some people call a 1 1/2. This is where if something materially unimportant is different then you simply update the most current variant, and if something vital is different then you insert a new variant.

Anyway, I'm not going to stay up late. This is why I am a consultant, I come into companies to teach architecture, modeling, and process design. You can't learn by reading, you have to do it. The best way is to work under someone and see them do it. Read books and columns on warehousing. IMO, any team that doesn't bring in an experienced and seasoned architect is NUTS. It's like living in a house built by a team who NEVER DID IT BEFORE. It isn't about moving data, or just sticking bytes in tables.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
vsi
Premium Member
Premium Member
Posts: 507
Joined: Wed Mar 15, 2006 1:44 pm

Post by vsi »

Hi chulett,

Hope you can provide with the Ascential 'Tech Tip' document. Would be very helpful.

Thanks
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

ray.wurlod wrote:Since you've got version 7, you might also like to investigate the job "templates" (generators - can't remember the exact terminology) supplied with it for handling slowly changing dimensions. They're not perfect, but can save some work.
Ray- Can you tell more about where I can find these templates(or generators)?Is it in manual?
Craig- Can you send me the Tips and Tricks document .
Thanks in advance
Kris

Where's the "Any" key?-Homer Simpson
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Yikes, we just set the Wayback Machine to 2003 it seems. :shock:

Those 'Version 7 job templates' never materialized for Server jobs. They made a brief appearance and then were yanked from what I recall. Something similar branded as 'Wizards' is in the product, I do believe, but only for PX jobs.

I'll look and see about the T&T document, but I'd really rather upload it someplace than start mailing it out to the Universe. You all have access to ADN?
-craig

"You can never have too many knives" -- Logan Nine Fingers
kris007
Charter Member
Charter Member
Posts: 1102
Joined: Tue Jan 24, 2006 5:38 pm
Location: Riverside, RI

Post by kris007 »

chulett wrote:Yikes, we just set the Wayback Machine to 2003 it seems. :shock:

Those 'Version 7 job templates' never materialized for Server jobs. They made a brief appearance and then were yanked from what I recall. Something similar branded as 'Wizards' is in the product, I do believe, but only for PX jobs.

I'll look and see about the T&T document, but I'd really rather upload it someplace than start mailing it out to the Universe. You all have access to ADN?
Wizards I could not find them even for PX jobs can you tell more where I can find them. Uploading it to ADN should be fine for me but not really sure how many ppl here have access to it.
Thanks
Kris

Where's the "Any" key?-Homer Simpson
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I guess 'Wizards' may be the wrong term, although they are fairly Wizard-like from what I recall. Look under your Client installation directory for a sub-directory called 'Assistants'... that be them, as far as I know. Don't have access to PX right now to verify. I'm not even sure they're documented. :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

And y'all will need to either provide an email address here or in your profile before I can send anyone anything. FYI. :roll: :lol:

Edited to add: please don't. As noted later, it is uploaded at ADN. This was a reaction to my surprise at being asked to send something to people without being given any way to actually do so.
Last edited by chulett on Sun Jun 25, 2006 7:57 am, edited 1 time in total.
-craig

"You can never have too many knives" -- Logan Nine Fingers
vsi
Premium Member
Premium Member
Posts: 507
Joined: Wed Mar 15, 2006 1:44 pm

Post by vsi »

toengineer@rediffmail.com. Do inform after sending an email.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Ok... as mentioned earlier, I'm not really excited about starting a mailing service. At some point this site will have a File Library, but in the meantime since ADN does, I've uploaded a zip file there.

It has two official Ascential Tech Tips in it from 2000 for Server jobs - one on how to handle all three types of SCDs and another bonus one on generating test data with a job that sources from a Transformer. The basic concept in the latter can come in very handy in several other circumstances as well.

Here's a direct link to the Category there. The file is called 'TwoTechTips.zip' and is in the Documentation or Instructions category. If you don't have an account there, you really should get one.
-craig

"You can never have too many knives" -- Logan Nine Fingers
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

And for those of you without access to ADN, our illustrious Kim Duke has graciously hosted the file at his website as well:

http://www.duke-consulting.com/Download ... chTips.zip

Make sure you checkout his site while you're there, don't just download and run.

Thanks Kim! :D
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply