Java Transformer

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
kuldeep165146
Participant
Posts: 10
Joined: Wed May 01, 2013 11:21 pm

Java Transformer

Post by kuldeep165146 »

Hi All,
I am new to Data Stage. I have to insert record of missed call in DB. The problem is record getting generate 2 million rows in 1 min. The file is getting back log with date and time appending in file name and new file is getting generated in every min.

I have to write a Java Code to achieve the bulk read/write after some data filtration and modification and have to put in my Java Transformer.
So that when ever file is getting generated it should call my ETL and do the bulk insert in DB for further Data Mining and report generation.

Can any body help me to achieve this (bulk read write through Java transformer )..

Any help would be appreciated !!!


Regards,
Kuldeep
prasson_ibm
Premium Member
Premium Member
Posts: 536
Joined: Thu Oct 11, 2007 1:48 am
Location: Bangalore

Post by prasson_ibm »

Hi Kuldeep,
I don't think by writing java transformation,you could be able to trigger your job,you can only do transformation using java code.for triggering a job you need to consider another design.

Even for transformation why don't you build logic in transformer instrad of java code.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

I'm a little lost with your requirement and why anything you are doing would require Java. What's your database? Why do you think you need Java?

DataStage can load several files at once so you don't need to load each new file that are being generated "every minute" individually. If you really need to do that you'll need a multi-instance job to keep up and now we're well beyond "new to DataStage" territory.
-craig

"You can never have too many knives" -- Logan Nine Fingers
kuldeep165146
Participant
Posts: 10
Joined: Wed May 01, 2013 11:21 pm

Post by kuldeep165146 »

To prasson_ibm,
Thanks for your reply..

Is there any way to achieve batch read write through java Transformer (java code... because Stage class is having method only for read row but didn't find anything related to batch read write). I am not sure, is there any way to filter the unwanted data and modify it properly for a each row and make it ready for the insertion in DB.


Regards,
Kuldeep
kuldeep165146
Participant
Posts: 10
Joined: Wed May 01, 2013 11:21 pm

Post by kuldeep165146 »

To chulett

I think i have specified already my problem, what i need to achieve through Java code.
why am i using Java ?
Since i have to modify the data alot so anyhow i need to use Java, no other way. Btw its only a minor part of the major req that's all i can say.

Help me if there's way to achieve batch read write in java code. i am using couchbase and vertica as my DB. data need to insert in vertica DB.


Regards,
Kuldeep
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

kuldeep165146 wrote: i need to use Java, no other way.
I doubt that very much.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

ray.wurlod wrote:
kuldeep165146 wrote:i need to use Java, no other way.
I doubt that very much.
As do I.

We're more than happy to help with DataStage issues here but for "batch read write in java code" help you should be posting elsewhere. By that I mean a forum that specifically supports Java developers. Once it is developed, people here can help you integrate it into a DataStage job, if need be. However, from what you've posted so far I honestly don't see what role DataStage would even play in this process.

Can you clarify for us what part of this process your DataStage job would do? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
kuldeep165146
Participant
Posts: 10
Joined: Wed May 01, 2013 11:21 pm

Post by kuldeep165146 »

I am trying to integrate my Java code in Java transformer provided by DataStage. The API exposed here is Stage class (in tr4j.jar) which I am extending it in my java class but the problem I am facing is, Stage class is having only readRow() method(didn't get any method to read in batch). so i have to read row one by one instead of batch which is making my transformer performance slow.

Is there any other class available in DataStage to read write in batch :?:

I am seeking help here, to achieve the same :roll:
might be you are all right in one sense but Plz help me if you can, instead of questioning on decision made by higher level. I am just a beginner who is seeking the solution of the problem.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Regular DataStage Connector stages (and, indeed, some older stage types) do read in batch ("arrays of rows" is the terminology) and, under the right circumstances (such as from a partitioned table), can do so in parallel processes, allowing for very large numbers of rows to be processed in a comparatively short time, without any need for Java.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

The Java Stage's API reads rows "from a link" using readrow, and it writes output to DataStage links using writeRow. Then...within your java class, we assume that you have your own code to perform I/O to those databases.

If you are having performance issues, determine where they are happening. Change your Java class so that it doesn't do anything but "count" the rows that it receives. Does that improve things? If so, the bottleneck you are experiencing may be in your database loading code, and that would be a java issue and not much we can do to help you there.

If the process still runs slow, then perhaps you are doing too much work after readrow. One thing to try and do is avoid any detailed column work. Use one big giant character column and put all the columns together (upstream, before reaching the java stage), fixed length if you can (to avoid any parsing). This way there is only one column coming in per read-row -- don't parse out each individual column in a loop, don't check their datatypes --- work that out in your code, preferably in some sort of offset fashion.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

kuldeep165146 wrote:I am just a beginner who is seeking the solution of the problem.
You can :roll: all you like but you must realize that people are trying to help you. The first thing any responsible mentor is going to do is attempt to point out when there are better solutions out there than the path a student is going down. And in your position I would push back, you are taking a high-speed tool and throttling it down with a Java 'choke-point'.

However, if you're stuck with this 'solution' where DataStage reads your source and sends records one by one to the Java Transformer then you have no mechanism to do anything 'batch'. It seems to me that your entire solution would need to be in Java for that to be possible and then as I noted earlier there's really no role for an ETL tool here.

As Ernie points out the best you can do is optimize your Java processing so it is as efficient as possible, that and minimize the amount of DataStage processing before the Java step. Have you tried isolating the Java target loading code? Meaning if you write the final product to a flat file instead does the processing speed up significantly?
-craig

"You can never have too many knives" -- Logan Nine Fingers
kuldeep165146
Participant
Posts: 10
Joined: Wed May 01, 2013 11:21 pm

Post by kuldeep165146 »

That's correct Ernie !!!

Is there a way to achieve batch read write in java transformer :?:
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

No...the Transformer reads rows and/or writes them...that is the interface to the link. Benchmark your results at the DataStage interface point and reduce the link to a single column and it will be faster.

If you are on 9.1, the new Java Integration Stage achieves higher performance than the Java Transformer, but that is because it is simply more efficient at row handling, like a Connector --- "batch" loading is a term for various databases. If there is a "batch" API for the databases you are using, it would be something implemented "in" your class. The interface to the links is the same either way.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

kuldeep165146 wrote:Is there a way to achieve batch read write in java transformer :?:
Did you even read my post? :?
-craig

"You can never have too many knives" -- Logan Nine Fingers
Post Reply