Hi
I am generating a flat file in fixed-width and i need this file in UTF-8 format.
I changes NLS map in sequential stage and also in job properties to UTF-8.
but when i check the file in unix box, it was showing as us-ascii .
i used below command for file format check in unix
File -bi <FF1>
output:-
text/plain; charset=us-ascii
can you let me know how to generate a file in UTF-8 format ?
How to generate file in UTF-8 format
Moderators: chulett, rschirm, roy
-
- Participant
- Posts: 342
- Joined: Tue Nov 04, 2008 10:38 am
- Location: Chennai, India
Does the file you are checking actually contain any characters that don't map to the single-byte character set? Otherwise you will always get this value from the "file" command.
<a href=http://www.worldcommunitygrid.org/team/ ... TZ9H4CGVP1 target="WCGWin">
</a>
</a>
Correct me if I am wrong but I thought UTF-8 is "one of several" extended ascii sets, that is bytes 0-127 are "ascii" and 128-255 are mapped for "non english" characters.
If you don't use any chars over 127, I am not sure that any tool can tell the difference (??) between them, assuming we are talking a pure text file without markup or extensions or some other way to differentiate?
Again, I could be wrong, so I am half asking here...
If you don't use any chars over 127, I am not sure that any tool can tell the difference (??) between them, assuming we are talking a pure text file without markup or extensions or some other way to differentiate?
Again, I could be wrong, so I am half asking here...
Yes but isn't there some sort of a magic (maybe 4 byte) header on UTF-8 files? I recently had an issue where a particular set of files would come in either format and my tool when set to UTF-8 could read either without issue but when set to US-ASCII would barf on a UTF-8 file, adding some "garbage" characters to the first field.
-craig
"You can never have too many knives" -- Logan Nine Fingers
"You can never have too many knives" -- Logan Nine Fingers
Seems like a UTF-8 file could come with an optional 3-byte BOM.
But that is no guarantee that it is a UTF-8 file.
I think if you're expecting a UTF-8 file, getting a us-ascii file should be no problem.
If you're expecting a us-ascii file, getting a UTF-8 file with a BOM is going to be a problem even if everything after the BOM is ASCII.
Mike
But that is no guarantee that it is a UTF-8 file.
I think if you're expecting a UTF-8 file, getting a us-ascii file should be no problem.
If you're expecting a us-ascii file, getting a UTF-8 file with a BOM is going to be a problem even if everything after the BOM is ASCII.
Mike