Parse JSON using a Schema

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
jneasy
Participant
Posts: 32
Joined: Sun Jan 29, 2012 8:47 pm
Location: Australia

Parse JSON using a Schema

Post by jneasy »

Hi,

I have been attempting to parse large JSON files (+200MB) that are provided on a daily basis using the schema provided.

I do know that in the Hierarchical stage the assembly editor can infer the schema by just importing the JSON data file,I have been able to parse a file based on this approach. The problem with this is that I have been given no guarantees that the +200MB file fulfills all fields defined in the schema.

My question is has anyone been able to import a JSON schema and use that to parse JSON data?

I have even tried using a simple Person example found at https://json-schema.org/learn/miscellan ... mples.html

Using the sample data the first parser step produces the following in the Downstream Output Test Data which you can see the firstName, lastName and age items are not being populated with the Name and Age values;

Code: Select all

{
  "$id": "https://example.com/person.schema.json",
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "Person",
  "type": "object",
  "properties": {
    "firstName": {
      "type": "string",
      "description": "The person's first name."
    },
    "lastName": {
      "type": "string",
      "description": "The person's last name."
    },
    "age": {
      "description": "Age in years which must be equal to or greater than zero.",
      "type": "integer",
      "minimum": 0
    }
  }
}<?xml version="1.0" encoding="UTF-8"?><top>
  <InputLinks/>
  <result>
    <root>
      <__24_id>
        <@originalName>$id</@originalName>
      </__24_id>
      <__24_schema>
        <@originalName>$schema</@originalName>
      </__24_schema>
      <@type>object</@type>
      <properties>
        <firstName>
          <@type>object</@type>
          <@@isPresent>false</@@isPresent>
        </firstName>
        <lastName>
          <@type>object</@type>
          <@@isPresent>false</@@isPresent>
        </lastName>
        <age>
          <@type>object</@type>
          <@@isPresent>false</@@isPresent>
        </age>
        <@type>object</@type>
        <@@isPresent>false</@@isPresent>
      </properties>
    </root>
  </result>
</top>
Cheers,
jneasy.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Don't like seeing posts without a single reply so here I am... wondering if you made any progress with this.
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

JSON Schema has seen some success, but is not widespread --- not like xml schema and its formality. The Hierarchical Stage uses a formal JSON document --- best suggestion is to find a "complete" one that has at least one instance of each element, and preferably, two or more instances of any node arrays that are able to carry multiple values. Import that and it will mimic a schema for you....and then you can get "reasonable" validation functionality from this "inferred" schema.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
jneasy
Participant
Posts: 32
Joined: Sun Jan 29, 2012 8:47 pm
Location: Australia

Post by jneasy »

@ chulett : No progress so far. My next thought is to generate some test data based on the JSON schema. This is where I run into my next problem, the schema is full of cascading references.

@ eostic : I thought someone would comeback with the trying to find a "complete" JSON file. Ive been working off this premise so far and is mostly working but I think I will need to dummy up a "complete" file to complete all mappings.

Appreciate the help guys!

Im going to mark this topic as work around. Work around being generating a "complete" JSON file.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

That's the best approach due to the fact that there is no formal standard for JSON schemas.

Note --- Be careful when completing your document to fully represent your arrays...meaning...if you have a truly repeating subnode, then put more than one value in that node array. I haven't checked carefully our json to xml schema functionality, but I've done similar things with an open source tool called trang, which is useful for converting "complete" xml documents into xml schema (also needed for this Stage). ...and trang will consider a singly occuring node to be OCCURS=1, which may not be correct for a given situation.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
Post Reply