Heap Allocation Error with XML output stage
Moderators: chulett, rschirm, roy
Hi Everyone,
I have opened a PMR with IBM and they noticed that datastage cannot support more than about 200MB per record. My requirement is to create the xml file with only one row for all the repeating elements. In this case as iam getting 1.35 million rows i assume the size of one row would exceed 600MB.
So could any one suggest how to resolve this issue inorder to match the requirement.
Below is my xsd
<?xml version='1.0' encoding='utf-8'?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:annotation>
<xsd:documentation>for data Interface - Prod_Tbl_Data</xsd:documentation>
</xsd:annotation>
<xsd:element name="Prod_Tbl_Data_Set">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Prod_Tbl_Data_Record" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:all>
<xsd:element name="Prod_Type" type="xsd:string" />
<xsd:element name="Prod_Table_ID" type="xsd:string" />
<xsd:element name="Key_Values" type="xsd:string" />
<xsd:element name="Start_Service_Date" type="xsd:string" default="19900101" />
<xsd:element name="End_Service_Date" type="xsd:string" minOccurs="0" default="19900101" />
<xsd:element name="Valid_From_Date" type="xsd:string" />
<xsd:element name="Valid_To_Date" type="xsd:string" minOccurs="0" />
<xsd:element name="Column1" type="xsd:string" minOccurs="0" />
<xsd:element name="Column2" type="xsd:int" minOccurs="0" />
<xsd:element name="Column3" type="xsd:string" minOccurs="0" />
<xsd:element name="Column4" type="xsd:decimal" minOccurs="0" />
<xsd:element name="Column5" type="xsd:string" minOccurs="0" />
<xsd:element name="Column6" type="xsd:string" minOccurs="0" />
<xsd:element name="Extraction_Time" type="xsd:string" minOccurs="0" />
</xsd:all>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Thanks in Advance
I have opened a PMR with IBM and they noticed that datastage cannot support more than about 200MB per record. My requirement is to create the xml file with only one row for all the repeating elements. In this case as iam getting 1.35 million rows i assume the size of one row would exceed 600MB.
So could any one suggest how to resolve this issue inorder to match the requirement.
Below is my xsd
<?xml version='1.0' encoding='utf-8'?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<xsd:annotation>
<xsd:documentation>for data Interface - Prod_Tbl_Data</xsd:documentation>
</xsd:annotation>
<xsd:element name="Prod_Tbl_Data_Set">
<xsd:complexType>
<xsd:sequence>
<xsd:element name="Prod_Tbl_Data_Record" minOccurs="0" maxOccurs="unbounded">
<xsd:complexType>
<xsd:all>
<xsd:element name="Prod_Type" type="xsd:string" />
<xsd:element name="Prod_Table_ID" type="xsd:string" />
<xsd:element name="Key_Values" type="xsd:string" />
<xsd:element name="Start_Service_Date" type="xsd:string" default="19900101" />
<xsd:element name="End_Service_Date" type="xsd:string" minOccurs="0" default="19900101" />
<xsd:element name="Valid_From_Date" type="xsd:string" />
<xsd:element name="Valid_To_Date" type="xsd:string" minOccurs="0" />
<xsd:element name="Column1" type="xsd:string" minOccurs="0" />
<xsd:element name="Column2" type="xsd:int" minOccurs="0" />
<xsd:element name="Column3" type="xsd:string" minOccurs="0" />
<xsd:element name="Column4" type="xsd:decimal" minOccurs="0" />
<xsd:element name="Column5" type="xsd:string" minOccurs="0" />
<xsd:element name="Column6" type="xsd:string" minOccurs="0" />
<xsd:element name="Extraction_Time" type="xsd:string" minOccurs="0" />
</xsd:all>
</xsd:complexType>
</xsd:element>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
</xsd:schema>
Thanks in Advance
-
- Participant
- Posts: 12
- Joined: Wed Oct 03, 2007 9:11 am
- Location: London
Hello pradkumar,
As you said Datastage couldn't handle huge data > 200MB .
Instead of using XML stage , write the data into a flat file.
Use Java/C code to convert into XML file . Using Execute command , implement the code in the Datastage job
In 8.x version , we can use Java based stage
we are handling 20 GB XML file using the above approach
Thanks
As you said Datastage couldn't handle huge data > 200MB .
Instead of using XML stage , write the data into a flat file.
Use Java/C code to convert into XML file . Using Execute command , implement the code in the Datastage job
In 8.x version , we can use Java based stage
we are handling 20 GB XML file using the above approach
Thanks
Regards
Raga
Raga
Hi All,
Thanks for your responses.
Other than writing a custom code to create a xml file,could anyone let me know is there a way to create around 500k records as one xml file with the same structure ( repeatition elements, (aggregating rows option)) and another 500 k records as 2nd xml file so on..
Currently my job design is as follows
Dataset ( more than a million)--->XML out put stage (aggregate rows)
Thanks in Advance
Thanks for your responses.
Other than writing a custom code to create a xml file,could anyone let me know is there a way to create around 500k records as one xml file with the same structure ( repeatition elements, (aggregating rows option)) and another 500 k records as 2nd xml file so on..
Currently my job design is as follows
Dataset ( more than a million)--->XML out put stage (aggregate rows)
Thanks in Advance
Pradeep Kumar
Thanks for the response. I dont think i can use the columns that are passing from the source. So i need to introduce a new column in the transformer and use the new column as trigger option. But the question is how can i create one xml with 0.5 M and 2nd with 0.5 M to 1 M.. 3rd with 1M to 1.5 M.. Could you help me in building the logic ?
Pradeep Kumar
Introducing a new column is fine, there's no need to include it in the output XML. And just make sure the value changes at the appropriate record count, everything else is automatic. Stage variables in a transformer would probably be the easiest approach, either by resetting a current file counter to zero and incrementing the trigger column value every 500,000 records or doing a Mod() on the total output record count.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Hi,
I introduced a new column and using the function @inrownum/10(Test purpose) for that field and running the transformer sequentially to make sure all the source records process to one node . And in later job iam using new coulum as trigger option and creating the multiple files.
So if the source has 1.35 million records then i can put it as @inrownum/500000 to split into 3 files.
I tried with the mod option but for this i need to introduce mutliple xml output stages and handle with the constraints in the transformer.
Could any one help me in building other logic instead of running the transformer sequentially and by using only one xml out stage.
I tried but couldn't able to.
Thanks in advance
I introduced a new column and using the function @inrownum/10(Test purpose) for that field and running the transformer sequentially to make sure all the source records process to one node . And in later job iam using new coulum as trigger option and creating the multiple files.
So if the source has 1.35 million records then i can put it as @inrownum/500000 to split into 3 files.
I tried with the mod option but for this i need to introduce mutliple xml output stages and handle with the constraints in the transformer.
Could any one help me in building other logic instead of running the transformer sequentially and by using only one xml out stage.
I tried but couldn't able to.
Thanks in advance
Pradeep Kumar