XML Input - Dataset Output

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
anil411
Premium Member
Premium Member
Posts: 53
Joined: Thu Aug 11, 2005 8:34 am

XML Input - Dataset Output

Post by anil411 »

We are reading below XML file as below. The Last Column(CondoProjectName) has a special char.

Code: Select all

<?xml version="1.0" encoding="UTF-8"?> 
 <typ:UCDPReceiveAppraisalRequest xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:typ="http://receiveappraisal.company.com/schema/types"> 
    <typ:CurrentFileSequenceNumber>1</typ:CurrentFileSequenceNumber> 
    <typ:LastFileSequenceNumber>0</typ:LastFileSequenceNumber> 
    <typ:DocumentFileIDRecordCount>1</typ:DocumentFileIDRecordCount> 
    <typ:SyntheticTestIndicator>true</typ:SyntheticTestIndicator> 
    <typ:RequestSubmitDateTime>2008-09-18T21:49:45</typ:RequestSubmitDateTime> 
     <typ:RequestBeginDateTime>2014-09- 18T19:18:33</typ:RequestBeginDateTime> 
     <typ:RequestEndDateTime>2006-08-19T13:27:14-04:00</typ:RequestEndDateTime> 
     <typ:VendorName>XYZ Support</typ:VendorName> 
     <typ:DocumentFiles> 
      <typ:DocumentFile> 
       <typ:DocumentFileID>FILEID-1</typ:DocumentFileID> 
       <typ:DocumentFileStatus>Successful</typ:DocumentFileStatus> 
       <typ:AppraisalRecordCount>1</typ:AppraisalRecordCount> 
        <typ:Appraisals> 
          <typ:Appraisal> 
            <typ:DocumentID>DOCID-1</typ:DocumentID> 
            <typ:DocumentType>2</typ:DocumentType> 
            <typ:SubmissionStatus>In Progress</typ:SubmissionStatus> 
            <typ:RawPropertyStreetAddress>1234 Any Street  Drive</typ:RawPropertyStreetAddress> 
           <typ:RawPropertyUnitNumber>11</typ:RawPropertyUnitNumber> 
           <typ:RawPropertyCity>Vienna</typ:RawPropertyCity> 
           <typ:RawPropertyState>VA</typ:RawPropertyState> 
           <typ:RawPropertyZipCode>22102</typ:RawPropertyZipCode> 
    <typ:RawAppraiserLicenseNumber>string</typ:RawAppraiserLicenseNumber> 
  <typ:RawAppraiserStateNumber>string</typ:RawAppraiserStateNumber> 
 <typ:RawAppraiserCertificationNumber>string</typ:RawAppraiserCertificationNumber> 
 <typ:ScrubbedAppraiserLicenseNumber>string</typ:ScrubbedAppraiserLicenseNumber> 
 <typ:RawSupervisorAppraiserLicenseNumber>string</typ:RawSupervisorAppraiserLicenseNumber> 
 <typ:RawSupervisorCertificationNumber>string</typ:RawSupervisorCertificationNumber> 
 <typ:ScrubbedSupervisorAppraiserLicenseNumber>string</typ:ScrubbedSupervisorAppraiserLicenseNumber> 
 <typ:AppraisedValueOfSubjectProperty>1000.00000000000</typ:AppraisedValueOfSubjectProperty> 
 <typ:EffectiveDateOfAppraisal>2002-11-05-05:00</typ:EffectiveDateOfAppraisal> 
 <typ:AppraisalFormNumberType>Small Residential Income Property Appraisal Report</typ:AppraisalFormNumberType> 
 <typ:AssignmentType>Purchase</typ:AssignmentType> 
 <typ:AssignmentTypeOther>string</typ:AssignmentTypeOther> 
 <typ:PropertyRightsAppraised>Fee Simple</typ:PropertyRightsAppraised> 
 <typ:PropertyRightsAppraisedOther>string</typ:PropertyRightsAppraisedOther> 
 <typ:CondoProjectName>‘Foothills Addition</typ:CondoProjectName> 
 </typ:Appraisal> 
 </typ:Appraisals> 
 </typ:DocumentFile> 
 </typ:DocumentFiles> 
 </typ:UCDPReceiveAppraisalRequest> 
After reading the XML input file, We are writing to a Dataset. The value in Output file for CondoProjectName is written as "^ZFoothills Addition"

If i change encoding from UTF-8 to ISO-8859-1 , The Data matches but we receive the file from external agencies.

We want to continue using UTF-8 and Data should match between XML and
Dataset. Please let me know if anybody faced similar issue.

Appreciate your help
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Are you certain that there is no ^Z character in the source data? (Note, too, that this is the DOS end-of-file marker.)
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
anil411
Premium Member
Premium Member
Posts: 53
Joined: Thu Aug 11, 2005 8:34 am

Post by anil411 »

Ray,

We don't have ^Z character in the source data.

The Source Data is as below.

<typ:CondoProjectName>‘Foothills Addition</typ:CondoProjectName>

Output in Dataset is as below.

"^ZFoothills Addition"

Please advise me.
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Can you please advise what the actual (hex codes) values are for the first three source characters after the tag? ^Z is 0x1A (which may help you).

It appears that you are not using a compatible character map between source and target.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Re: XML Input - Dataset Output

Post by chulett »

anil411 wrote:If i change encoding from UTF-8 to ISO-8859-1 , The Data matches but we receive the file from external agencies.
I don't understand the "but" here. Are you saying you change the encoding in the job and it works? Or if you change the first element in the XML file?
-craig

"You can never have too many knives" -- Logan Nine Fingers
anil411
Premium Member
Premium Member
Posts: 53
Joined: Thu Aug 11, 2005 8:34 am

Post by anil411 »

Chulett,

If encoding="UTF-8" in first line of XML, the data is having issue.

If encoding="ISO-8859-1" in first line of XML, The Data between Source
and Target are matching.

I can't change NLS Settings , as they are disabled in our Project.

Is there any function to resolve this issue.

Thank you,
chulett
Charter Member
Charter Member
Posts: 43085
Joined: Tue Nov 12, 2002 4:34 pm
Location: Denver, CO

Post by chulett »

Then it seems to me you have two options. One is to ask the vendor to make the change. Second is to pre-process the file and change it yourself using something like awk / sed / perl. This could built in as a Before Job process or accomplished via a Sequence job.
-craig

"You can never have too many knives" -- Logan Nine Fingers
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

I am fairly surprised that the data isn't being escaped as &#nn; where nn is the hex value for each of the bytes in question. That might be an option for you as you consider editing the overall file as Craig suggests.

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
ds_developer
Premium Member
Premium Member
Posts: 224
Joined: Tue Sep 24, 2002 7:32 am
Location: Denver, CO USA

Post by ds_developer »

What is the datatype (in DS) you are using for this field? I just did this without changing the encoding="UTF-8" designation by using the NVarChar datatype. No changes to the NLS settings either.
Post Reply