Using C++ custom applications in DataStage: Tips needed

Post questions here relative to DataStage Server Edition for such areas as Server job design, DS Basic, Routines, Job Sequences, etc.

Moderators: chulett, rschirm, roy

srinivasb
Participant
Posts: 42
Joined: Mon Mar 10, 2003 10:52 pm
Location: UK
Contact:

Using C++ custom applications in DataStage: Tips needed

Post by srinivasb »

Hi Ray and Ken,

While I have been hearing projects where C++ custom applications were written and utilized. I require expert inputs on :

1.What is the case when C++ is used ? Is it becos such functions cannot be written by any other application?
2. How does Perl interact with DataStage?
3.What are the typical steps which are used to do above?

If you have any document or any info is available on the web , please do respond with a link. I have browsed thru the documentation and I have not got much info.

Thanks in advance for your inputs.

Regards
Srinivas.B
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The main area where I have heard of C++ being used is in creating custom GUIs for plug-in stages. For more information on this, see the Plug-In Stage Writer's Guide.

Perl doesn't actually interact with DataStage in any way of which you're probably thinking. There is a Perl module in DataStage that you need when constructing a "resource file" to accompany a plug-in stage. (Actually I haven't checked that this is still used in release 6.0.)

The other place where Perl can be used is with the Click PACK, which is primarily used to read and parse Web server logs. There is a technical bulletin on the Click PACK, which includes a chapter on using Perl in DataStage jobs (Perl scripts, Perl in user-written Transforms, Perl in user-written Routines, regular expression Transforms, and using Perl from the TCL (UniVerse) prompt).

To find this manual, open the Click PACK folder on your DataStage CD and find clickpack_techbull.pdf therein.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Ray's covered pretty much everything. You do have the ability to write an external function in any language and register it into DataStage. I personally have never seen anyone do this, as it's vastly easier to just write DataStage BASIC equivalents. It would allow you to use existing C/Perl/whatever programs as functions inside DataStage derivations were you needing to use existing tried and true functions.

Outside of building custom stages for current non-supported databases, what is your motivation for asking about C++?

Thanks,
-Ken
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Although slightly off-topic, I've heard that there's a Java stage in 7.0.
srinivasb
Participant
Posts: 42
Joined: Mon Mar 10, 2003 10:52 pm
Location: UK
Contact:

Re C++ usage in DS

Post by srinivasb »

Hi Ken and Ray,

Thanks at the outset for your time and attention. I have a situation where we were informed that we had to re pipe (re design) a whole host of DS designs and we were also requested to sharpen our C++ skills for this assigment which actually is a migration from Oracle to DB2.

In this context, we had a C++ resource( programmer)assigned, Hence I was keen to know how to utilize his expertise as i have not done anything in C++ so far!!

Thanks and Regards
Srinivas.B
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Would the c++ requirement be because you are upgrading to the Parallel Extender version of DataStage? Or are you tasked with re-writing slow performing DS jobs with hand-written c++ code?

In every situation where customers have shown me slow DataStage jobs it's because a job tried to everything in a single job. They did not design their jobs in such a way as to take advantage of hash files, job instantiation, and bulk-loading.

Thanks,
-Ken
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I agree with Ken's post.
While PX gives you the ability to create a custom stage type, this doesn't mean that you must do so. "Custom" is for situations where DataStage can't do exactly the job you require. If that job is ETL, chances are that there's very good capability using existing DataStage components. And you get the performance by following best practice, as Ken suggests. There is a team of exceptionally good engineers creating DataStage, and they try at all times to ensure that their products are best of breed.
srinivasb
Participant
Posts: 42
Joined: Mon Mar 10, 2003 10:52 pm
Location: UK
Contact:

The replies

Post by srinivasb »

Yes Ken and Ray,

It is a PX situation, the customer is developing the designs on PX and we ( offshore/remote development team ) are supporting our developers who are onsite .As of now, we have not started any concrete work on that front.

The plug in stages are displayed , am not able to get the plug in writer's guide in the set of pdfs listed.

Regards
Srinivas.B
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Unless you are doing very customized work that requires developing your own custom plugin, the c++ requirement for using Parallel Extender is not a strong requirement.

Parallel Extender jobs are based on the Torrest Orchestrate technology. The c++ requirement pertains to writing very specific, highly localized transformation rules. DataStage Server jobs have a similar capability using the underlying BASIC language. Since Parallel Extender jobs are compiled C code, it follows that a knowledge of C is helpful were you to go outside the inherent functionality in the tool.

I've never been in a situation where the inherent stages within DataStage Server edition were not sufficient for what I was doing and required a custom plugin to be designed. For example, if I don't like the Aggregator stage in DataStage, I'd use a temporary table in a database and aggregate using SQL. If volume was a concern then I'd buy a package like CoSort or SyncSort and use those. I'd not be writing a plugin to handle what it is I'm trying to do. Likewise, if I need custom database connectivity, I wouldn't write a plugin to do that, I'd try to get Ascential to write one.

Good luck!
-Ken
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

I have been involved in only one such case, for a company in Japan (Hitachi) who had their own corporate database product, HiRDB. They wanted to be able to sell DataStage to users of their database, and provide stage types to access it. Fairly obviously a highly specialized requirement.
Let me reiterate my belief that you will not need to create a custom stage.
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

kcbland wrote:I've never been in a situation where the inherent stages within DataStage Server edition were not sufficient for what I was doing and required a custom plugin to be designed. For example, if I don't like the Aggregator stage in DataStage, I'd use a temporary table in a database and aggregate using SQL. If volume was a concern then I'd buy a package like CoSort or SyncSort and use those. I'd not be writing a plugin to handle what it is I'm trying to do. Likewise, if I need custom database connectivity, I wouldn't write a plugin to do that, I'd try to get Ascential to write one.
Well, I have. There has been situations where we, using Parallel Extender, were forced to create a simple custom stage to do an equivalence of "[fieldname].NEXTVAL" that is possible under the Server job. We also have to construct the ability to search through all fields that are dynamic, compare it with fields on an equivalent table in Oracle, and spit out a formatted record with old and new field to a single link. We are also looking into integrating a third-party solution to convert/correct address fields automatically within DataStage.

There are a lot of situations that comes to mind right now that require us to build either BuildOPS, or CustomOPS for Parallel Extender. Heck, we have to do a few self-handled aggregation BuildOPS due to apparent limitation on the Aggregation Stage for Parallel Extender.

Naturally, those growing pains are normal for the first iteration of Parallel Extender (formerly Orchestrate), and we are eagerly looking forward to 7.0, which promises to be much more robust for our needs.

I am really enjoying this forum which I discovered today, after asking a Support person whether there is a forum for stuff like this. It is quite frustrating when I feel at a loss with simple solutions that is so deadly simple under C++, and yet unwieldly for DataStage, at least at first glance.

A perfect example: Inputting from Oracle a decimal field, and outputting it as a fixed length character field on the flat file in a specific format (x.xx at a mininimum). Doing this in PX is a challenge, especially with one nagging problem: Truncating leading zeros. You would think that they would include a Trim() capability, but it is limited to trimming spaces.

As noted: Eagerly anticipating Twister...

-T.J.
kcbland
Participant
Posts: 5208
Joined: Wed Jan 15, 2003 8:56 am
Location: Lutz, FL
Contact:

Post by kcbland »

Welcome aboard!

You quoted my comment specific to DataStage Server, but then proceeded to describe PX limitations forcing you to write custom transformers.

My point still stands, C is not required for Server, but for PX. The C requirement for PX is because of the underlying technology. C is required for Server ONLY if you are writing custom stages, for which I have not seen the requirement in my 5+ years of DataStage usage.

Thanks
vdreddy
Participant
Posts: 5
Joined: Fri Oct 10, 2003 11:32 am

Post by vdreddy »

May be this gives some insight...to using C++ in DS(Orchestrate)

I have used orchestarte(before it merged with datastage-PX)...if we have to write a custom Operator(they are called PX stages now)...Torrent provided a API with Orchestarte...its build on C++ OO concepts...and one can inherit them to write ur own custome Operators...we have used atleast one for TS conversion...u just have to include the API header files and do ur coding in C++
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

The place to look is the chapter on Buildops in the Parallel Job Developer's Guide
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

On an offnote, I love using "CustomOPS" and "BuildOPS", and watch everyone else sputter and stumble on the difference between the two.

Co-worker: "So you can compile the CustomOPS in DataStage?"

Me: "No, but you can generate BuildOPS."

Co-worker: "But you have the source code for BuildOPS outside DataStage?"

Me: "Sorta, yes -- only available after you generate it. But we do build our own source for CustomOPS..."

Co-worker: *sputtering nonsense*

:lol:

-T.J.
Developer of DataStage Parallel Engine (Orchestrate).
Post Reply