Heap Allocation Error with XML output stage

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Hello Everyone,

Any inputs?
pjsimon20
Premium Member
Premium Member
Posts: 9
Joined: Mon Aug 31, 2009 8:44 pm

Post by pjsimon20 »

Did you try setting the APT_PHYSICAL_DATASET_BLOCK_SIZE to more than 597169415 when you were tweaking it?
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Hi,

I tried by setting APT_PHYSICAL_DATASET_BLOCK_SIZE to more than 597169415 but i got the warning as INVALID environment specification for APT_PHYSICAL_DATASET_BLOCK_SIZE;
the valid range is 8192 to 268435456; setting the block size to 131072.

Thanks
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Hi Everyone,

I have opened a PMR with IBM and they noticed that datastage cannot support more than about 200MB per record. My requirement is to create the xml file with only one row for all the repeating elements. In this case as iam getting 1.35 million rows i assume the size of one row would exceed 600MB.

So could any one suggest how to resolve this issue inorder to match the requirement.

Below is my xsd

<?xml version='1.0' encoding='utf-8'?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:annotation>
<xsd:documentation>for data Interface - Prod_Tbl_Data</xsd:documentation>
</xsd:annotation>
<xsd:element name="Prod_Tbl_Data_Set">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Prod_Tbl_Data_Record" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:all>
<xsd:element name="Prod_Type" type="xsd:string" />
<xsd:element name="Prod_Table_ID" type="xsd:string" />
<xsd:element name="Key_Values" type="xsd:string" />
<xsd:element name="Start_Service_Date" type="xsd:string" default="19900101" />
<xsd:element name="End_Service_Date" type="xsd:string" minOccurs="0" default="19900101" />
<xsd:element name="Valid_From_Date" type="xsd:string" />
<xsd:element name="Valid_To_Date" type="xsd:string" minOccurs="0" />
<xsd:element name="Column1" type="xsd:string" minOccurs="0" />
<xsd:element name="Column2" type="xsd:int" minOccurs="0" />
<xsd:element name="Column3" type="xsd:string" minOccurs="0" />
<xsd:element name="Column4" type="xsd:decimal" minOccurs="0" />
<xsd:element name="Column5" type="xsd:string" minOccurs="0" />
<xsd:element name="Column6" type="xsd:string" minOccurs="0" />
<xsd:element name="Extraction_Time" type="xsd:string" minOccurs="0" />
</xsd:all>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>

Thanks in Advance
ragasambath
Participant
Posts: 12
Joined: Wed Oct 03, 2007 9:11 am
Location: London

Post by ragasambath »

Hello pradkumar,

As you said Datastage couldn't handle huge data > 200MB .

Instead of using XML stage , write the data into a flat file.

Use Java/C code to convert into XML file . Using Execute command , implement the code in the Datastage job

In 8.x version , we can use Java based stage

we are handling 20 GB XML file using the above approach

Thanks
Regards

Raga
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Hi All,

Thanks for your responses.

Other than writing a custom code to create a xml file,could anyone let me know is there a way to create around 500k records as one xml file with the same structure ( repeatition elements, (aggregating rows option)) and another 500 k records as 2nd xml file so on..

Currently my job design is as follows

Dataset ( more than a million)--->XML out put stage (aggregate rows)

Thanks in Advance
Pradeep Kumar
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Use the Trigger Column option with a column that changes value every .5M records.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Thanks for the response. I dont think i can use the columns that are passing from the source. So i need to introduce a new column in the transformer and use the new column as trigger option. But the question is how can i create one xml with 0.5 M and 2nd with 0.5 M to 1 M.. 3rd with 1M to 1.5 M.. Could you help me in building the logic ?
Pradeep Kumar
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Introducing a new column is fine, there's no need to include it in the output XML. And just make sure the value changes at the appropriate record count, everything else is automatic. Stage variables in a transformer would probably be the easiest approach, either by resetting a current file counter to zero and incrementing the trigger column value every 500,000 records or doing a Mod() on the total output record count.
-craig

"You can never have too many knives" -- Logan Nine Fingers
pradkumar
Charter Member
Charter Member
Posts: 393
Joined: Wed Oct 18, 2006 1:09 pm

Post by pradkumar »

Hi,

I introduced a new column and using the function @inrownum/10(Test purpose) for that field and running the transformer sequentially to make sure all the source records process to one node . And in later job iam using new coulum as trigger option and creating the multiple files.
So if the source has 1.35 million records then i can put it as @inrownum/500000 to split into 3 files.
I tried with the mod option but for this i need to introduce mutliple xml output stages and handle with the constraints in the transformer.
Could any one help me in building other logic instead of running the transformer sequentially and by using only one xml out stage.
I tried but couldn't able to.

Thanks in advance
Pradeep Kumar
Post Reply