Page 1 of 1

Choosing right ETL tool to build new DW enviroinment

Posted: Fri Jun 06, 2014 1:18 am
by cherry
Dear All,

Good Day,

We are building a datawarehouse from scratch which we don't have at the moment.

We have daily transactions of apprx 100 million records to transform and load into our new DW enviroinment.
We need to do few transformations like trimming on each field and aggregate functions before we load data into datawarehouse.


I would like to know how feasible is Datastage ETL tool to process apprx 100 million records daily using costlier transformations like Trimming and aggregating.

It would be great help if I can get some documents to read on hardware requirements and performance stats regarding the size of data that we will be dealing with.

Our source and targets are Oracle database.

Any help appreciated in picking the right choice of ETL tool, I hope I am looking at right tool and iam in right forum.

Thanks
Cherry

Posted: Fri Jun 06, 2014 5:52 am
by ray.wurlod
We here are naturally biased towards DataStage, since this is a forum for that and related tools. And, yes, 100 million rows per day is certainly feasible using this tool.

Information on planning and configuration can be found in the IBM Information Server Planning, Installation and Configuration Guide (a manual downloadable from IBM website) or the same information is available on line at IBM Information Center

There is some knowledge about competitive products among this community, and some of them may choose to post about them. Otherwise you can make use of the internet - there are also websites that purport to compare these products.

Posted: Fri Jun 06, 2014 6:53 am
by chulett
Interesting... you must be a new incarnation of Cherry as you've posted here 107 other times since 2005 asking questions about a tool you don't have. :wink:

As Ray notes, it certainly is one of the tools out there that could handle your volume. If you don't mind paying someone to have done the research / comparison work for you, a quick search turned up this site:

http://www.etltool.com/

Posted: Fri Jun 06, 2014 7:15 am
by qt_ky
Feasible indeed. DataStage is highly scalable and works very well with Oracle. Here are a few overviews you may find interesting:

IBM Knowledge Center - Parallel processing in InfoSphere Information Server

IBM Knowledge Center - Overview of InfoSphere Data Click

You may also contact an IBM sales rep to request an IBM InfoSphere Information Server sizing estimate. It's a pre-sales estimate document, based on your information, that gives a good idea of the hardware required to meet your needs. IBM uses your details along with their various lab benchmarks to produce this document for you.

Posted: Fri Jun 06, 2014 1:23 pm
by chulett
But isn't ODI more of an ELT tool and built around PL/SQL or am I thinking of something else?

Posted: Fri Jun 06, 2014 2:55 pm
by ray.wurlod
Yes you do have to create a lot of PL/SQL scripts to use ODI.