Endianness in UTF-16 encoding
Hello,
I have a curious endianness problem. I get different results with the Sun
JVM and gij. Consider this simple program:
=========================================
import java.io.*;
public class Test
{
public static void main(String[] args) throws java.io.IOException
{
OutputStreamWriter o = new OutputStreamWriter(System.out, "UTF-16");
o.write("Hello!");
o.flush();
}
}
=========================================
According to Sun's API docs (Charset class), the UTF-16 encoding is supposed
to default to big-endian. This is also what I get when running with Sun's
JVM:
00000000: feff 0048 0065 006c 006c 006f 0021 ...H.e.l.l.o.!
But when I run the same program (still Sun-compiled) with gij, I get
little-endian output:
00000000: fffe 4800 6500 6c00 6c00 6f00 2100 ..H.e.l.l.o.!.
This difference causes the test suite of one my packages to fail. I'm
running on i386.
What confuses me is that I checked the Classpath source for
OutputStreamWriter, and it does the right thing, i.e., big-endian:
http://cvs.savannah.gnu.org/viewcvs/classpath/java/io/OutputStreamWriter.java?rev=1.11.2.8&root=classpath&view=markup
Is there a bug somewhere? And where should I look for it?
Marcus
Reply to: