XML An invalid XML character (Unicode: 0xffffffff) was found

Post questions here relative to DataStage Enterprise/PX Edition for such areas as Parallel job design, Parallel datasets, BuildOps, Wrappers, etc.

Moderators: chulett, rschirm, roy

Post Reply
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Don't know, Andy. Does it open in IE and Firefox and show that character? It is possible that it might need to be escaped...

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Just tried to display the unmodified file and both Chrome and IE hated it. Then tried the file with the errant character replaced, and they liked it. No error message displayed from IE, but Chrome considered it an encoding error.

Problem is I can't get the dang thing changed in the source, I supposed I could put in some sort of UNIX tr filter to translate all the messages, but I'm worried about the performance hit.

Still would like to know why it doesn't like that character...
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
eostic
Premium Member
Premium Member
Posts: 3838
Joined: Mon Oct 17, 2005 9:34 am

Post by eostic »

Probably one of those loose ends in the specification for xml, where it probably says something like "up to the implementer".

These days, the browsers are pretty good barometers of what works and what doesnt for "acceptable" xml and json content. If they can't read it...well....don't expect much else to be able.

Google the various escape codes used for hex possibilities within XML. I don't have it memorized, but you can put in the pure hex of the values (via the escape sequence) by transforming them (your xml strings) beforehand with BASIC Transformer...

Ernie
Ernie Ostic

blogit!
<a href="https://dsrealtime.wordpress.com/2015/0 ... ere/">Open IGC is Here!</a>
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Or something has produced one or more Char fields with 0xff as the pad character. Open with a hex editor to verify.
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Not to hijack my own thread - but the real problem is I don't want to filter ALL the messages. I'm currently reading the files in the Hierarchical Stage using the fileset option and it aborts the job when it sees the "malformed" message.

Is there any way to get it to dump to a reject link of some sort? I'm trying to get it to go out using the reject options, and so far it isn't working.

Andy
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

Making progress....

Per earlier suggestions I encased the character in XML "escape" syntax and it now is parsed as valid XML.

When we replaced a hex E8 with &#xE8; it processed correctly.

So... now I just have to figure out how make a generic replace happen that clean up ALL the different "bad" characters.

If anyone has a pre-built sed / tr / grep sequence that can be used as a starting point, please post!
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
ray.wurlod
Participant
Posts: 54607
Joined: Wed Oct 23, 2002 10:52 pm
Location: Sydney, Australia
Contact:

Post by ray.wurlod »

Might you be able to use Ereplace() in a Transformer stage immedicately following the Oracle Connector?

Or perhaps even a REPLACE function in the extraction SQL?
IBM Software Services Group
Any contribution to this forum is my own opinion and does not necessarily reflect any position that IBM may hold.
asorrell
Posts: 1707
Joined: Fri Apr 04, 2003 2:00 pm
Location: Colleyville, Texas

Post by asorrell »

One of the guys I'm working with is also a C programmer. He adapted a code snippet from the web and wrote a very nice C program that scans through a file and "escapes" anything over Octal 128. Works very quickly - initial tests in Dev were .25 GB per minute, which is great for our XML processing. It reads every character and writes it to a new file.

Usage: ascii_filter infilename outfilename

Code: Select all

#include <stdio.h>

int main ( int argc, char *argv[] )
{
    if ( argc != 3 ) /* argc should be 3 for correct execution */
    {
        /* We print argv[0] assuming it is the program name */
        printf( "usage: %s in_filename out_filename\n", argv[0] );
    }
    else
    {
        FILE *rfile = fopen( argv[1], "r" );
        FILE *wfile = fopen( argv[2], "w" );

        /* fopen returns 0, the NULL pointer, on failure */
        if ( rfile == 0 || wfile == 0 )
        {
            printf( "Could not open file\n" );
            exit(-1);
        }
        else
        {
            int x;
            /* read one character at a time from file, stopping at EOF, which
               indicates the end of the file.  Note that the idiom of "assign
               to a variable, check the value" used below works because
               the assignment statement evaluates to the value assigned. */
            while  ( ( x = fgetc( rfile ) ) != EOF )
            {
                if ( x < 128 )
                    {
                    fprintf(wfile, "%c", x);
                    }
                else
                    {
                    fprintf(wfile, "&#x%x;", x);
                    }
            }
            fclose( rfile );
            fclose( wfile );
        }
    }
}
Many thanks to Warren K. for the code!
Andy Sorrell
Certified DataStage Consultant
IBM Analytics Champion 2009 - 2020
wpkalsow
Premium Member
Premium Member
Posts: 11
Joined: Wed Mar 12, 2003 6:13 pm
Contact:

Post by wpkalsow »

You're welcome! Use it in good health.
Post Reply