Making ETL refer a rule based engine

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

srinivasb
Participant
Posts: 42
Joined: Mon Mar 10, 2003 10:52 pm
Location: UK
Contact:

Making ETL refer a rule based engine

Post by srinivasb »

Hi All:

Just wanted to share a real complex issue which has been dogging me for some time now:

Has anyone got DataStage jobs to refer business rules for data in a Decision engine:

Let me give you the situation:

A leading finance giant wants a huge data transformation process done with DataStage, all rules have been developed in complex Java code and they wanted us to have DataStage call rules from these calculation engines.

While we tried duplicating code in PL/SQL and tried calling procedures , OUT parameters were the first block and second was the code complexity by itself.
Is there anyone with prior experience in handling this sort of thing successfully?

Regards
Srini
Srinivasan Balasubramanian
Data Architect
London,
Phone :07840668861
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I have created rule-based ETL using DataStage, however it's an entirely different model from yours in that mine was table-based rather than object-based.
That said, I'm sure it could be done, but not via stored procedures. Instead, you would need a C-callable wrapper to the objects which you would call using the GCI. And this is not territory for the faint hearted!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
roy
Participant
Posts: 2598
Joined: Wed Jul 30, 2003 2:05 am
Location: Israel

Post by roy »

Hi,
since you mentioned java, maybe the java pack could be of some help ?

Good Luck,
Roy R.
Time is money but when you don't have money time is all you can afford.

Search before posting:)

Join the DataStagers team effort at:
http://www.worldcommunitygrid.org
Image
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

The introduction of a rules based engine for transformation removes the T from the ETL formula. In my opinion, the T is the strength of DataStage. I suggest that your company probably bought the wrong tool if all they need DataStage to handle is the E and the L.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Re: Making ETL refer a rule based engine

Post by ray.wurlod »

srinivasb wrote:... we tried duplicating code in PL/SQL ...
Since duplicating the Java code is acceptable, what's the difficulty in duplicating it in DataStage?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
datastage
Participant
Posts: 229
Joined: Wed Oct 23, 2002 10:10 am
Location: Omaha

Post by datastage »

kcbland wrote:The introduction of a rules based engine for transformation removes the T from the ETL formula. In my opinion, the T is the strength of DataStage. I suggest that your company probably bought the wrong tool if all they need DataStage to handle is the E and the L.
The amount of business logic without an organization is limitless. T is the strength of DataStage, but not everything needs to be done there. I think you can reach out to an external source to due some rules processing and still have PLENTY to do with T inside of DataStage. Using some one Ascential's data quality/profiling suite will reduce the T, but there will still be plenty of cleansing done in the ETL in an organization that does implemnent those correctly.
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.

"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
datastage
Participant
Posts: 229
Joined: Wed Oct 23, 2002 10:10 am
Location: Omaha

Post by datastage »

This won't apply to Srini's problem of rules in java code, but with that type of question in general I'm intrigued with what Razza Dimension Server can do for an organization. There may be some intregration with Ascential products in the future and there is also a Master Data Management forum on this site with the early beginning of discussion on Razza
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.

"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

I have a rules/metadata based data cleansing module/bolt-on for DataStage to enforce L0 data quality with some cross-column verification and enforcement. However, the question is transformation, not cleansing. That being said, T is mostly about enforcing business rules, from my experience. The cleansing is the oft-missed portion, but one that I believe should be metadata/rules driven. But transformation by rules takes the performance tweaking/tuning ability completely out of the hands of the product and the product is specifically not designed that way. That's why I made the comment that the wrong tool may have been chosen. If you're looking for a rules-driven transformation tool, it ain't DataStage.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
datastage
Participant
Posts: 229
Joined: Wed Oct 23, 2002 10:10 am
Location: Omaha

Post by datastage »

kcbland wrote: If you're looking for a rules-driven transformation tool, it ain't DataStage.
True. DataStage isn't strong in rules-driven transformation. Yet at the same time don't you feel quite powerful with it at your hands? It can certainly implement complex rules during transformation.

Any thoughts on wheter it needs to gain strength as a product in this area? In a recent point about BI I made the point that I don't see a fit to incorporate this into DS and the ASCL suite of products...I still don't think there is probably a strong need to make it a rules-driven ETL tool.

Also, will some of the Web Services capabilities (I'm still a neophyte in Web Services) make external calls to a ruled based engine easier? If so, it may be possible to make this an easier task for DS without changing its style.
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.

"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
datastage
Participant
Posts: 229
Joined: Wed Oct 23, 2002 10:10 am
Location: Omaha

off topic story

Post by datastage »

datastage wrote: Yet at the same time don't you feel quite powerful with it at your hands?
That reminds me of a story of one of my companies favorite managers at a client. One of the IT managers was quite a character, and once said, "I love DataStage, I can write a DataStage program to wipe my ass!"

You know you have a satisfied client and believer in the product when you hear that.
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.

"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

As much as I enjoy this thread, we're discussing last years model. The Server product is not the future, it's the Parallel edition. The flexibility and ease of use of the Server product will allow you to do anything. Every custom solution has to be done with an eye towards open metadata and sharing.

I don't think Ascential wants to move into an amorphous "anything goes" rules based arena, but that's my opinion and I don't have any specific insider knowledge. You lose all ability to high-performance tune, because the rules will vary from row to row. The trade-off is flexibility versus performance, and right now performance is a bigger driving sales factor than flexibility.

Personally, I LOVE the Server product and believe that Parallel technology should have blended into it for the E and L and some T. Right now, my ETL solutions are mixed bag combinations of Server and Parallel jobs. Introducing a Rules Based Transformation stage would pretty much chuck the whole way I'd use DS.
Kenneth Bland

Rank: Sempai
Belt: First degree black
Fight name: Captain Hook
Signature knockout: right upper cut followed by left hook
Signature submission: Crucifix combined with leg triangle
vmcburney
Participant
Posts: 3593
Joined: Thu Jan 23, 2003 5:25 pm
Location: Australia, Melbourne
Contact:

Post by vmcburney »

I have heard almost the same requirement from a bank, a lot of advanced calculation routines had ben built using C code and they wanted to know if it could be reused with an ETL tool. Worst case scenarios
- build it again using DataStage routines
- use DataStage to touch the data down to flat files, write a script that starts the calculation engine and outputs it to flat files, read them in using DataStage.
Best case scenario:
- Find some way to deliver data directly to the calculation engine from DataStage such as API (which has a short use by date) or web services.

I would avoid PL/SQL as the layer between DataStage and the calculation engine as it is too clumsy.

I agree that a centralised rules engine is probably out of the reach of Ascential and may distract them from the core objectives of scalability and easy integration. Products offering centralised rules services carry price tags several times larger then a full Ascential suite and they are a very complex technology.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

INTEGRITY began life as a flat-file to flat-file tool. When it became Quality Stage the input and output flat files could be replaced (effectively) by named pipes. This approach might be able to be used for your API too - the Sequential File stage has been able to used named pipes for quite a few versions now.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
mhester
Participant
Posts: 622
Joined: Tue Mar 04, 2003 5:26 am
Location: Phoenix, AZ
Contact:

Post by mhester »

Srinivas,

I understand what you want to do and we (Ascential) accomplished this very task @ BellSouth in 2001 and 2002. The requirement was to process data and apply very complex business rules (not only complex, but dozens of rules had to be applied to each incoming row of data), these business rules that would certainly slow the output of rows to the DW. We chose to use ILOG Jrules (Java) rules engine and to create our own plug-in to call this interface. It did not take much time and worked very well.

A developer would translate the business rule into a Java class coded to the ILOG API and the plug-in would allow for a choice of what class to use. The bonus here was that you could choose to operate on one row of data or 1000's of rows of data so you can tune the process to meet your needs.

Writing sequential or procedural business rules within DS is not the same as calling an inference or rules engine. The rules engine (because of the algorithm used) is much faster and can act on multiple rules and rows of data simultaneously.

If you would like more information then please drop me a private line and I can send you some information on what we did and how we accomplished it.

Regards,

Michael Hester
datastage
Participant
Posts: 229
Joined: Wed Oct 23, 2002 10:10 am
Location: Omaha

Post by datastage »

mhester wrote:If you would like more information then please drop me a private line and I can send you some information on what we did and how we accomplished it.
Mike, you want to delver deeper with Srinivas, but I'd still like to see a post here going a step deeper into the methodology you used. Or at least something to help us gain a better understanding on how to implement this in general without detailed specifics to Srinivas's development or the work at BellSouth.

Thanks,

Byron
Byron Paul
WARNING: DO NOT OPERATE DATASTAGE WITHOUT ADULT SUPERVISION.

"Strange things are afoot in the reject links" - from Bill & Ted's DataStage Adventure
Post Reply