Page 1 of 1

Json File 717MB unable to import in Datastage

Posted: Thu Aug 10, 2017 6:37 am
by Vrisha
Scenario- I have a Json files with size of 717 mb and above.
When I trying to do the 'Import New Resource' using the below , it throws an error of 'TaskTime'.json UPLOAD FAILED.Internal Error.

Using Schema Library Manager ---> New library option -->> TaskTime-Import New resource---> Select the file 'Tasktime.json' ---Throws an error.

I didn't find any problem in importing the schema for file size less than 717MB. I checked with the application team and they told that the file is in correct format.

Is there any file size limit in Datastage for Json files? Is there any option to increase/ set the file size limit in Datastage.

Please let me know.Thanks.

Posted: Thu Aug 10, 2017 6:43 am
by eostic
"Import new resource" is only a metadata import function...... the Job itself, once you have it coded and developed, can handle massive JSON documents. Just cut down the json document that you have so that you have a sample with at least "one complete set of nodes" (with at least one instance of all the properties you desire to parse or create). Do the import --- it will dynamically create the schema that you need, and continue to develop your Assembly.

You don't need to "import" that large JSON document, and it is quite unlikely that the 700+ meg is entirely unique instances of the properties, only one of each. More likely, your 700+ meg document is a full document, complete with many repeating sets of actual data, transactions, etc.

Ernie

Posted: Fri Aug 11, 2017 6:12 am
by Vrisha
Thanks for your prompt response, Erni. I will get back to you on this.

Posted: Tue Aug 15, 2017 8:15 am
by Vrisha
Hi Ernie,

While still waiting for my membership upgrade to Premium member, I want to clarify the below

Instead of using the Schema library manager to import the metadata for JSON -TaskBaseline (as it is failing due to file size), I used the assembly editor in Hierarchical data stage to create the output columns by pointing to the same name columns in another Task\root. (Task library has the same name columns except 'TimeByDay' which is required in the TaskBaseline module)

I mapped the TimeByDay(TaskBaseline) to FinishDate column(Task) in output mapping of assembly editor.

But while running the job, all the records got dropped with an warning saying that incoming TimeByDay is '0'. But the Task.Json have the date values in TimeByDay.

So I tried to export the TaskBaseline.json which has similar structure as Task.json and renamed the element name like below and saved it as TaskBaseline.json

From
<xs:element name='FinishDate' minOccurs='0' nillable='true' type='xs:string'/>

To
<xs:element name='TimeByDay' minOccurs='0' nillable='true' type='xs:string'/>


While trying to 'import the new resource' using the new Task.json file, it is throwing the error below

'TaskBaseline.json' UPLOAD FAILED. Unexpected character, location = 0, value = '<'

What could be the reason for this. Please let me know

Posted: Tue Aug 15, 2017 9:28 am
by eostic
Cut down your 717meg json to one set of properties and imoport it. Your should be fine. The result below is xml and not json, so I cant speculate.on what the error might be after changes.

Ernie

Posted: Tue Aug 15, 2017 9:33 am
by Vrisha
Thanks for your reply, Erni.

What do you mean by ' cut down the 717mb file to one set of properties'. Sorry I didn't understand. Please let me know.

Posted: Tue Aug 15, 2017 9:35 am
by chulett
There's no reason to import the whole dang thing. Cut the size down, create a new file with just a single set of properties, a single example of the data and then import that.

Posted: Tue Aug 15, 2017 9:39 am
by Vrisha
Thank you, Chulett. Got it, I will try and get back with the result

Posted: Tue Aug 15, 2017 1:42 pm
by Vrisha
The problem is resolved.

As mentioned by chulett and Ernie, I took 1 record out of huge json file
(277775 records) with metadata structure like below and saved in as TaskBaseline.json
---------------------------------------------------------------------------------------
[{
"__metadata": {
"id": "https://dartorg.sharepoint.com/sites/pw ... A00\u0027)",
"uri": "https://dartorg.sharepoint.com/sites/pw ... A00\u0027)",
"type": "ReportingData.TaskBaselineTimephasedData"
},
"Project": {
"__deferred": {
"uri": "https://dartorg.sharepoint.com/sites/pw ... 7)/Project"
}
},
"Task": {
"__deferred": {
"uri": "https://dartorg.sharepoint.com/sites/pw ... 0027)/Task"
}
},
"TaskBaselines": {
"__deferred": {
"uri": "https://dartorg.sharepoint.com/sites/pw ... kBaselines"
}
},
"ProjectId": "59f88501-bd76-e711-80cb-eba85625f89a",
"TaskId": "ebf88501-bd76-e711-80cb-eba85625f89a",
"TimeByDay": "\/Date(1502928000000)\/",
"BaselineNumber": 0,
"ProjectName": "161 - Proterra Window Decals",
"TaskBaselineBudgetCost": "0.000000",
"TaskBaselineBudgetWork": "0.000000",
"TaskBaselineCost": "0.000000",
"TaskBaselineFixedCost": "0.000000",
"TaskBaselineModifiedDate": "\/Date(1501595954740)\/",
"TaskBaselineWork": "16.000000",
"TaskName": "161 - Proterra Window Decals"
}]

----------------------------------------------------------------------------------

Then I imported the metadata using the schema library manager by pointing to the new small (TaskBaseline.json) file.
Then in Edit Assembly of Hierarchical datastage -'Json source' pointed to huge Json source file in 'Single file' option and did the mapping.

The job ran fine without any error

Thank you for support, chulett and Ernie.

Posted: Wed Aug 16, 2017 3:14 pm
by eostic
Congrats! Note that, for others reading this in the future, be sure in your cut down version to at least include 2 of any repeating subnodes so that the metadata interpreter k ows that those subnodes repeat and should be defines as a list.

Good work!

Ernie