XML An invalid XML character (Unicode: 0xffffffff) was found

eostic · Post by **eostic** » Wed Jan 11, 2017 5:57 am

Don't know, Andy. Does it open in IE and Firefox and show that character? It is possible that it might need to be escaped...

Ernie

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Wed Jan 11, 2017 3:02 pm

Just tried to display the unmodified file and both Chrome and IE hated it. Then tried the file with the errant character replaced, and they liked it. No error message displayed from IE, but Chrome considered it an encoding error.

Problem is I can't get the dang thing changed in the source, I supposed I could put in some sort of UNIX tr filter to translate all the messages, but I'm worried about the performance hit.

Still would like to know why it doesn't like that character...

eostic · Post by **eostic** » Wed Jan 11, 2017 4:26 pm

Probably one of those loose ends in the specification for xml, where it probably says something like "up to the implementer".

These days, the browsers are pretty good barometers of what works and what doesnt for "acceptable" xml and json content. If they can't read it...well....don't expect much else to be able.

Google the various escape codes used for hex possibilities within XML. I don't have it memorized, but you can put in the pure hex of the values (via the escape sequence) by transforming them (your xml strings) beforehand with BASIC Transformer...

Ernie

ray.wurlod · Post by **ray.wurlod** » Wed Jan 11, 2017 11:18 pm

Or something has produced one or more Char fields with 0xff as the pad character. Open with a hex editor to verify.

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Mon Jan 16, 2017 2:54 pm

Not to hijack my own thread - but the real problem is I don't want to filter ALL the messages. I'm currently reading the files in the Hierarchical Stage using the fileset option and it aborts the job when it sees the "malformed" message.

Is there any way to get it to dump to a reject link of some sort? I'm trying to get it to go out using the reject options, and so far it isn't working.

Andy

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Tue Jan 17, 2017 2:34 pm

Making progress....

Per earlier suggestions I encased the character in XML "escape" syntax and it now is parsed as valid XML.

When we replaced a hex E8 with è it processed correctly.

So... now I just have to figure out how make a generic replace happen that clean up ALL the different "bad" characters.

If anyone has a pre-built sed / tr / grep sequence that can be used as a starting point, please post!

ray.wurlod · Post by **ray.wurlod** » Tue Jan 17, 2017 11:48 pm

Might you be able to use Ereplace() in a Transformer stage immedicately following the Oracle Connector?

Or perhaps even a REPLACE function in the extraction SQL?

IBM Analytics Champion 2009 - 2020 · Post by **asorrell** » Thu Jan 19, 2017 9:31 am

One of the guys I'm working with is also a C programmer. He adapted a code snippet from the web and wrote a very nice C program that scans through a file and "escapes" anything over Octal 128. Works very quickly - initial tests in Dev were .25 GB per minute, which is great for our XML processing. It reads every character and writes it to a new file.

Usage: ascii_filter infilename outfilename

Code: Select all

#include <stdio.h>

int main ( int argc, char *argv[] )
{
    if ( argc != 3 ) /* argc should be 3 for correct execution */
    {
        /* We print argv[0] assuming it is the program name */
        printf( "usage: %s in_filename out_filename\n", argv[0] );
    }
    else
    {
        FILE *rfile = fopen( argv[1], "r" );
        FILE *wfile = fopen( argv[2], "w" );

        /* fopen returns 0, the NULL pointer, on failure */
        if ( rfile == 0 || wfile == 0 )
        {
            printf( "Could not open file\n" );
            exit(-1);
        }
        else
        {
            int x;
            /* read one character at a time from file, stopping at EOF, which
               indicates the end of the file.  Note that the idiom of "assign
               to a variable, check the value" used below works because
               the assignment statement evaluates to the value assigned. */
            while  ( ( x = fgetc( rfile ) ) != EOF )
            {
                if ( x < 128 )
                    {
                    fprintf(wfile, "%c", x);
                    }
                else
                    {
                    fprintf(wfile, "&#x%x;", x);
                    }
            }
            fclose( rfile );
            fclose( wfile );
        }
    }
}

Many thanks to Warren K. for the code!

wpkalsow · Post by **wpkalsow** » Fri Jan 27, 2017 3:05 pm

You're welcome! Use it in good health.