Runtime Column Propagation memory usage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
ArndW
Participant
Posts: 16318
Joined: Tue Nov 16, 2004 9:08 am
Location: Germany
Contact:

Post by ArndW »

I cannot check this at the moment, but doubt that the size of an identical job, one using RCP and the other explicitly defining the columns, will be appreciably different. The same applies to runtime performance.

The columns and associated datatypes are passed along dynamically to the OSH scripts at runtime when using RCP, whereas they are explicitly named when not using RCP. The actual code executed by the stages is no different; but the jobs look quite different in the designer.

Your assumption about the column widths isn't correct; you used varchar() as an example, but in reality the data types aren't known either when RCP is used.
csphere
Premium Member
Premium Member
Posts: 4
Joined: Sat Feb 11, 2012 1:05 pm

Post by csphere »

Hi ArndW,

Thanks for clearing that up, I forgot to realize that, when using RCP "nothing" is known in advance (including any datatype).
And that the "injection" of the schema definition in the OSH script is why RCP works in the first place :oops:

Thanks again!
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

I already answered this in your ticket to IBM Support.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Teej wrote:I already answered this in your ticket to IBM Support.
And now the answer has been shared with the DSXchange community. Yay!
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
Teej
Participant
Posts: 677
Joined: Fri Aug 08, 2003 9:26 am
Location: USA

Post by Teej »

Haha. Okay, fine -- here's my answer:

Regarding RCP question - RCP only allow for columns to be migrated without being explicitly defined. It does not automatically increase memory usage (and the increase in memory varies depending on the number of columns being migrated.) It is actually the default behavior of the Orchestrate engine (Parallel Engine), and the explicit definition of columns does impact startup time due to the increased code size.

It does not change the metadata definition of the column itself. Of course, if you redefined a column from a String[max=4000] to String[max=10] and renamed the field while redefining the column, RCP will propagate the original field along with the renamed field in some scenarios.

Note, we do not do any distinction of "Varchar", "Varchar2" or anything like that. We have Strings, and Ustrings - Unicode Strings.


Regarding truncation warnings - This behavior have actually been corrected via an internal fix for 11.3.1.2 and included in 11.5. We now do warn you if the string is longer than the defined length and that we will be truncating this string, but this is specific to the sequential file stage.

You may need to open a Request for Enhancement with the Oracle team if this is not being done for the Oracle stage (inserting a longer field than what is defined for Oracle itself.)
Post Reply