The Product Formerly Known as Hawk

A forum for discussing DataStage<sup>®</sup> basics. If you're not sure where your question goes, start here.

Moderators: chulett, rschirm, roy

Post Reply
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

The Product Formerly Known as Hawk

Post by ray.wurlod »

There is/will be a White Paper called "IBM Information Server: What's New in IBM Websphere DataStage 8.0". Thus far it does not appear in the IBM Library though there are some papers there dated as recently as October 15, 2006 describing other aspects if IBM Information Server.

I have (in hard copy only) a copy of that white paper, that I shall summarize in bullet points here. Personal comments are my own, not IBM's official position!

The first thing to get used to is that DataStage is just one optional component in IBM Information Server. QualityStage is another. Information Analyzer (a hybrid of ProfileStage, AuditStage and some of the investigation capabilities of QualityStage) is another. Federation Server is yet another. Get used to the new name - what you will need to buy from IBM is now called IBM Information Server at the top level, then you specify individual components.

Underneath the covers of IBM Information Server is a unified data (and metadata) management layer. This is a horribly complex structure, as it combines the repositories from MetaStage, DataStage, QualityStage, ProfileStage and AuditStage. Everything is all in one place, and can therefore be instantly accessed by all products. No more need to export metadata from one tool and/or import into another (within the suite).

This uses your choice of DB2 UDB version 9 (only), Oracle version 10g (only) or SQL Server 2003 (only). You can use your own instance if you have one; otherwise a copy of DB2 that works only with IBM Information Server is installed. I believe this is enforced only at the licensing level and by the threat of visits from IBM licensing police.

Supporting the common metadata repository is a range of common services, including a metadata delivery service, a security service (unified login, integrated with LDAP and Active Directory), a logging and reporting service, and more. It is through these that the products access metadata - no more hacking the repository, I'm afraid.

A new connection mechanism is part of IBM Information Server; these connection objects reside in the repository and gain access to external data through a connection access server. The objects can be incorporated into any component (such as DataStage or QualityStage) and are alleged to be faster than any of the current technologies. There was a separate presentation on this technology at IOD, but the paper has not yet been placed on the web site.

Substantial changes, and yet very few, are to be seen in DataStage. The Manager client is gone, as are the security management facilities in the Administrator client (security is now unified across all IBM Information Server products). All Manager functionality is now available through the Designer.

Parameter sets (the name says it all) now exist. You can load parameter sets, as well as individual parameters, into job designs and job sequence designs. You can reference a member of a parameter set using the naming convention setname.parametername (no big surprise there).

The parallel Lookup stage supports range lookups. Yay!!!

In parallel jobs there are several new stage types, including a Slowly Changing Dimension stage (supports Type 1 and Type 2), Netezza, iWay, IBM Information Server Federation and Classic Federation stages. There are enhancements to the Complex Flat File stage, support for SQL Server and Teradata in the Stored Procedure stage, and the ability to use the new connection objects in all stage types (including in server jobs). Connectivity support has been added for Informix v10, Oracle 10gR2, SQL Server 2005, Sybase ASE15, sftp, Teradata V2R6.1/TTU8.1 and more.

Surrogate key management across multiple runs is now handled; this entails use of a file or a DB2 table for persistent storage.

A "fast path" mechanism that allows the developer to step through stage property tabs in correct order using arrow buttons - like a wizard - has been incorporated, at least in parallel job stages. I did not get to check for this in server job stages.

QualityStage Designer has disappeared, its functionality now in a combined DataStage and QualityStage Designer. QualityStage remains a separately licensed component. If you license it you get five more parallel stage types in the Designer; specific Investigation, Standardize, Match and Survive stages as well as a Legacy QS stage resembling the old plug-in and allowing existing QS jobs to be executed.

On the tools front, the SQL Builder now supports more databases (didn't get a list) and can now also build INSERT, UPDATE and DELETE statements. Impact and Lineage analyses can now be invoked from inside the Designer, and are fully hyperlinked back to Designer where appropriate. An Object Difference tool can be used to compare jobs, routines and table definitions, even if they are in different projects. And the Designer and other tools have both Quick Find and Advanced Find capabilities, with even descriptions being able to be included in the search, and results hyperlinked back into the pertinent object(s).

I mentioned earlier the new "rich" connection objects - formerly known as "frictionless connectors". At first release these will be available (that is, have been through QA) for ODBC, MQ Series (including client-only), Oracle 10g, DB2 UDB (including both DPF and non-DPF environments) and Teradata. The existing connection methodologies will still be supported. There is a new GUI for managing connections, and there is large object (LOB) support.

Performance issues have not been neglected. Though I am not going to quote figures, parallel job startup time has been markedly reduced (yay!) and they have done some work with buffering optimization and process combination. It is now possible to used time-based instead of row-based monitoring, and the job monitor can now adapt to system load, throttling back when CPU usage exceeds a particular threshold (80%). This last is called adaptive monitoring for you buzzword collectors.

For parallel jobs there is a resource estimation tool that works in two modes: static simply estimates resources based on the job design, while dynamic gives a more accurate result by processing a sample of data. A performance analysis tool, which generates reports for the Reporting Console (another part of IBM Information Server), completes the picture.

There is a new licensing scheme as I mentioned in an earlier post. No more auth codes to be typed in. Instead, the installer (a Java-based GUI that can drive Windows, X-Windows and other UNIX GUIs) reads an XML file that IBM sends you in exchange for wads of cash. There is also a command-line-based installer.

Users, groups and roles are now managed through the Administration client for IBM Information Server (not that for DataStage). There is an additional DataStage role called "Super Operator" who can inspect, but not alter, job designs, and run all jobs, not just released jobs. The Omit button disappears, along with LAN Manager authentication. On the up side, the new authentication model integrates with LDAP and Active Directory, so that a user has a single identity (and login) enabling use of all tools to which entitled by roles.

As you might expect, a huge set of manuals for your bedtime reading. A couple that we've been waiting too long for are a Parallel Job Tutorial and the Errors manual, which includes not only the meanings of messages but also, in most cases, suggested investigative and/or remediall action.

That's enough for now, the White Paper shouldn't be too long; it's 29 pages long and I don't propose to type it all in here!

They've done a good job with this product, and that includes the brave decision not to release it till it passed QA. Of course there will still be minor bugs, but they're delivering on most of their promises.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
sanjay
Premium Member
Premium Member
Posts: 203
Joined: Fri Apr 23, 2004 2:22 am

Post by sanjay »

Hi Ray

what is Federation Server . is it to access multiple database at a time.


Thanks
Sanjay
Kirtikumar
Participant
Posts: 437
Joined: Fri Oct 15, 2004 6:13 am
Location: Pune, India

Re: The Product Formerly Known as Hawk

Post by Kirtikumar »

ray.wurlod wrote:parallel job startup time has been markedly reduced
Hopefully this will release impact on designing PX or server job depending on total number of rows to be processed.
Regards,
S. Kirtikumar.
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Kirtikumar - I'm not sure what you mean by that, since the design phases don't get affected by the data volumes.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

sanjay wrote:Hi Ray

what is Federation Server . is it to access multiple database at a time.


Thanks
Sanjay
Searching on the IBM web site will give you full facts, and save me from having to type them in. It's not part of DataStage - it's another service supported on IBM Information Server.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

With the level of changes involved here, any idea how 'upgradable' a 7.x Server installation will be? Or is the preferred methodology to create a new 8.x installation and then import everything in?
-craig

"You can never have too many knives" -- Logan Nine Fingers
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Inplace upgrade is possible. Have backups.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
bmadhav
Charter Member
Charter Member
Posts: 50
Joined: Wed May 12, 2004 1:16 pm

Post by bmadhav »

Excellent post on HAWK Ray! With so much of confusion going with all the different product names floating around, and products renamed several times, it is all very clear now after reading ur post.
One note on the HAWK posting i would like to make :
the current HAWK BETA product we have installed on AIX has a DB2 UDB version 8 install(xmeta database repository).
Kirtikumar
Participant
Posts: 437
Joined: Fri Oct 15, 2004 6:13 am
Location: Pune, India

Post by Kirtikumar »

ArndW wrote:Kirtikumar - I'm not sure what you mean by that, since the design phases don't get affected by the data volumes. ...
Arnd, normally when we decide whether project needs to be developed in PX or server, we consider the total number of rows that will be processed in the warehouse. Few of the project in omy org are in server and few are in PX. For the upcoming projects, if the volume of the data is less, we go for server due to the 4/5 secs startup time in PX.
So if the startup time in new release becomes less then at least may be people can directly go for PX withour thinking about the startup time.
Regards,
S. Kirtikumar.
rameshrr3
Premium Member
Premium Member
Posts: 609
Joined: Mon May 10, 2004 3:32 am
Location: BRENTWOOD, TN

Post by rameshrr3 »

Does it mean the existing UV metadata engine is going to be done away with?


Thanks
Ramesh
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

Kirtikumar,

that makes sense and is logical way to progress. Many jobs (particularly smaller ones) are currently still faster in Server than in PX and since it is never a good idea to mix large numbers of server and PX jobs on the same DataStage machine splitting by project and DS Server is a good idea.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

bmadhav wrote:Excellent post on HAWK Ray! With so much of confusion going with all the different product names floating around, and products renamed several times, it is all very clear now after reading ur post.
One note on the HAWK posting i would like to make :
the current HAWK BETA product we have installed on AIX has a DB2 UDB version 8 install(xmeta database repository).
The beta shipped with DB2 v8. The GA release works with DB2 v9 only. Or Oracle 10g only. Or SQL Server 2003 (and maybe 2005).
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Kirtikumar
Participant
Posts: 437
Joined: Fri Oct 15, 2004 6:13 am
Location: Pune, India

Post by Kirtikumar »

Performance analysis in the HAWK is just amazing and it shows all the info like heap utilisation, memory and CPU usage, disk utilisation.

This one is just amazing and can help a lot in identifying the bottlenecks in the job. For each active process it gives all the details.
Regards,
S. Kirtikumar.
Post Reply