[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Endianness in UTF-16 encoding



Hello,

I have a curious endianness problem. I get different results with the Sun
JVM and gij. Consider this simple program:

=========================================
import java.io.*;

public class Test
{
    public static void main(String[] args) throws java.io.IOException
    {
        OutputStreamWriter o = new OutputStreamWriter(System.out, "UTF-16");
        o.write("Hello!");
        o.flush();
    }
}
=========================================

According to Sun's API docs (Charset class), the UTF-16 encoding is supposed
to default to big-endian. This is also what I get when running with Sun's
JVM:

00000000: feff 0048 0065 006c 006c 006f 0021       ...H.e.l.l.o.!

But when I run the same program (still Sun-compiled) with gij, I get
little-endian output:

00000000: fffe 4800 6500 6c00 6c00 6f00 2100       ..H.e.l.l.o.!.

This difference causes the test suite of one my packages to fail. I'm
running on i386.

What confuses me is that I checked the Classpath source for
OutputStreamWriter, and it does the right thing, i.e., big-endian:
 
http://cvs.savannah.gnu.org/viewcvs/classpath/java/io/OutputStreamWriter.java?rev=1.11.2.8&root=classpath&view=markup

Is there a bug somewhere? And where should I look for it?

Marcus




Reply to: