PX XML Input stage dropping records. No warnings or errors.

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
vlis
Premium Member
Premium Member
Posts: 5
Joined: Mon May 17, 2004 9:52 am

PX XML Input stage dropping records. No warnings or errors.

Post by vlis »

We recently completed a DataStage upgrade from 7.5.2 to 9.1.2.

DS 7.5.2
Server: None
Parallel: None


DS 9.1.2 NLS:
Server: UTF8 (Project Default)
Parallel: ASCL_ISO8859-1 (Project Default)

Parallel Job Description:
External source stage provides filename to XML input stage
XML input stage uses filename to parse XML file and writes to data files.

Observations (Same data file)
DataStage 7.5.2, XML stage outputs 10088 records
DataStage 9.1.3, XML stage outputs 4,798 records. No warnings or errors.

I found that data file contained the following html entity codes: 

When decoded, this is a newline.

When I removed html entities from the file, all 10088 records are processed.

DataStage job contains the following: External source (sends file name) connected to XML Input connected to transform connected to sequential file.

Questions:

Why does DataStage not write errors or warnings to the log?

Is there a way to tell DataStage 9.1.2 to ignore html entities and treat them as text?

Is this an NLS issue? NLS = None is not an option?

Any suggestions on how to resolve this?
Last edited by vlis on Sun Nov 23, 2014 8:15 pm, edited 1 time in total.
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

...what does "outputs" mean?

...outputs from the stage (formal Link Counts on that link)...or what you see in the final sequential file?

The answer to that question could be very telling in this example.

Let us know EXACTLY the different row counts for the output link coming from the Stage (without regard to the eventual target).

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
vlis
Premium Member
Premium Member
Posts: 5
Joined: Mon May 17, 2004 9:52 am

Updated original post

Post by vlis »

Clarifications:
output means data written from XML stage and sent to transformer (and eventually a sequential file stage.

DataStage 7.5.2 and 9.1.2 jobs are identical. Job was exported from 7.5.2 and imported into 9.1.2.

The 4,798 records eventually written by the 9.1.2 job matches the data from the 7.5.2.

When I remove the &#x0A; from the source file and process the file in 7.5.2 and 9.1.2 environments, the final files both contain 10,088 records and the data is identical.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Hex 0A is the UNIX newline character - I would suspect that something is interpreting it as an EOL.
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
eph
Premium Member
Premium Member
Posts: 110
Joined: Mon Oct 18, 2010 10:25 am

Post by eph »

Hi,

I know it won't help that much, but I faced the same problem on version 8.1 two years ago (only that my job failed instead of processing partial data).
I didn't found any info on this, should have raised a PMR on this but didn't had time for it.

Here is my old topic: http://dsxchange.com/viewtopic.php?t=145488.

Don't know why it wasn't solved, since those characters are in xml norms.

Edit: found this technote on another post of mine :)

Eric
Post Reply