"canonical-like" objects for ETL ?

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
NigelThompson
Participant
Posts: 1
Joined: Mon Apr 21, 2008 7:36 am

"canonical-like" objects for ETL ?

Post by NigelThompson »

Background:
We would like to introduce "canonical-like" objects, within our ETL space, for the same reason that we implement them within the EAI space.
Note: within our EAI space we have selected the OAGi canonical standard.

Currently, within our ETL, similar to our EAI message flows, we implement two 1/2 interfaces.
For example: sourceA -to- commonObject and commonObject -to- targetB.

Today, these common objects are very simple comma delimited files and we do not consider them to be canonicals because the metadata is not "rich enough".
Two key areas we would like to improve are: a) provide for reusability at a common sub-component level (for example: address) and b) allow for extensibility of these same sub-components.

Current thoughts are for us to use the OAGIS canonicals as a "logical" standard for our ETL common objects but implement as a non-XML flat file version using some sort of a compound/numbered delimiter to allow for navigation of a minimally nested structure.
Actually we may need to use a certain amount of XML just to build a file header that would define the format of the body. However, the body itself would be XML-free with just a compound/numbered delimiter.


Question:

Has anyone else experience in doing something similar within the ETL space and especially with DataStage ?


Thanking you in advance.

Nigel Thompson
ray.wurlod
Participant
Posts: 54595
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

DataStage itself does that within the Repository, though it uses its own idiosyncratic standards (the "UniVerse" database standards) for storage of "collections" and, where necessary, collections of collections and so on.

Moving these objects should be no big deal, so "E" and "L" are taken care of. The "T" step will probably require some parsing, since DataStage - at least in present versions - is very much geared to row-and-column-based processing.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Post Reply