Heap Allocation Error with XML output stage

pradkumar · Post by **pradkumar** » Mon Apr 05, 2010 9:24 am

Hello Everyone,

Any inputs?

pjsimon20 · Post by **pjsimon20** » Mon Apr 05, 2010 10:09 am

Did you try setting the APT_PHYSICAL_DATASET_BLOCK_SIZE to more than 597169415 when you were tweaking it?

pradkumar · Post by **pradkumar** » Mon Apr 05, 2010 3:29 pm

Hi,

I tried by setting APT_PHYSICAL_DATASET_BLOCK_SIZE to more than 597169415 but i got the warning as INVALID environment specification for APT_PHYSICAL_DATASET_BLOCK_SIZE;
the valid range is 8192 to 268435456; setting the block size to 131072.

Thanks

pradkumar · Post by **pradkumar** » Thu Apr 08, 2010 8:20 am

Hi Everyone,

I have opened a PMR with IBM and they noticed that datastage cannot support more than about 200MB per record. My requirement is to create the xml file with only one row for all the repeating elements. In this case as iam getting 1.35 million rows i assume the size of one row would exceed 600MB.

So could any one suggest how to resolve this issue inorder to match the requirement.

Below is my xsd

<?xml version='1.0' encoding='utf-8'?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:annotation>
<xsd:documentation>for data Interface - Prod_Tbl_Data</xsd:documentation>
</xsd:annotation>
<xsd:element name="Prod_Tbl_Data_Set">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Prod_Tbl_Data_Record" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:all>
<xsd:element name="Prod_Type" type="xsd:string" />
<xsd:element name="Prod_Table_ID" type="xsd:string" />
<xsd:element name="Key_Values" type="xsd:string" />
<xsd:element name="Start_Service_Date" type="xsd:string" default="19900101" />
<xsd:element name="End_Service_Date" type="xsd:string" minOccurs="0" default="19900101" />
<xsd:element name="Valid_From_Date" type="xsd:string" />
<xsd:element name="Valid_To_Date" type="xsd:string" minOccurs="0" />
<xsd:element name="Column1" type="xsd:string" minOccurs="0" />
<xsd:element name="Column2" type="xsd:int" minOccurs="0" />
<xsd:element name="Column3" type="xsd:string" minOccurs="0" />
<xsd:element name="Column4" type="xsd:decimal" minOccurs="0" />
<xsd:element name="Column5" type="xsd:string" minOccurs="0" />
<xsd:element name="Column6" type="xsd:string" minOccurs="0" />
<xsd:element name="Extraction_Time" type="xsd:string" minOccurs="0" />
</xsd:all>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>

Thanks in Advance

ragasambath · Post by **ragasambath** » Thu Apr 08, 2010 9:03 am

Hello pradkumar,

As you said Datastage couldn't handle huge data > 200MB .

Instead of using XML stage , write the data into a flat file.

Use Java/C code to convert into XML file . Using Execute command , implement the code in the Datastage job

In 8.x version , we can use Java based stage

we are handling 20 GB XML file using the above approach

Thanks

pradkumar · Post by **pradkumar** » Sun Apr 18, 2010 11:02 am

Hi All,

Thanks for your responses.

Other than writing a custom code to create a xml file,could anyone let me know is there a way to create around 500k records as one xml file with the same structure ( repeatition elements, (aggregating rows option)) and another 500 k records as 2nd xml file so on..

Currently my job design is as follows

Dataset ( more than a million)--->XML out put stage (aggregate rows)

Thanks in Advance

chulett · Post by **chulett** » Sun Apr 18, 2010 11:08 am

Use the Trigger Column option with a column that changes value every .5M records.

pradkumar · Post by **pradkumar** » Sun Apr 18, 2010 1:47 pm

Thanks for the response. I dont think i can use the columns that are passing from the source. So i need to introduce a new column in the transformer and use the new column as trigger option. But the question is how can i create one xml with 0.5 M and 2nd with 0.5 M to 1 M.. 3rd with 1M to 1.5 M.. Could you help me in building the logic ?

chulett · Post by **chulett** » Sun Apr 18, 2010 2:06 pm

Introducing a new column is fine, there's no need to include it in the output XML. And just make sure the value changes at the appropriate record count, everything else is automatic. Stage variables in a transformer would probably be the easiest approach, either by resetting a current file counter to zero and incrementing the trigger column value every 500,000 records or doing a Mod() on the total output record count.

pradkumar · Post by **pradkumar** » Tue Apr 20, 2010 12:43 pm

Hi,

I introduced a new column and using the function @inrownum/10(Test purpose) for that field and running the transformer sequentially to make sure all the source records process to one node . And in later job iam using new coulum as trigger option and creating the multiple files.
So if the source has 1.35 million records then i can put it as @inrownum/500000 to split into 3 files.
I tried with the mod option but for this i need to introduce mutliple xml output stages and handle with the constraints in the transformer.
Could any one help me in building other logic instead of running the transformer sequentially and by using only one xml out stage.
I tried but couldn't able to.

Thanks in advance