Endianness in UTF-16 encoding

To: debian-java@lists.debian.org
Subject: Endianness in UTF-16 encoding
From: Marcus Better <marcus@better.se>
Date: Thu, 07 Sep 2006 12:07:27 +0200
Message-id: <[🔎] edor10$bj6$1@sea.gmane.org>

Hello,

I have a curious endianness problem. I get different results with the Sun
JVM and gij. Consider this simple program:

=========================================
import java.io.*;

public class Test
{
    public static void main(String[] args) throws java.io.IOException
    {
        OutputStreamWriter o = new OutputStreamWriter(System.out, "UTF-16");
        o.write("Hello!");
        o.flush();
    }
}
=========================================

According to Sun's API docs (Charset class), the UTF-16 encoding is supposed
to default to big-endian. This is also what I get when running with Sun's
JVM:

00000000: feff 0048 0065 006c 006c 006f 0021       ...H.e.l.l.o.!

But when I run the same program (still Sun-compiled) with gij, I get
little-endian output:

00000000: fffe 4800 6500 6c00 6c00 6f00 2100       ..H.e.l.l.o.!.

This difference causes the test suite of one my packages to fail. I'm
running on i386.

What confuses me is that I checked the Classpath source for
OutputStreamWriter, and it does the right thing, i.e., big-endian:
 
http://cvs.savannah.gnu.org/viewcvs/classpath/java/io/OutputStreamWriter.java?rev=1.11.2.8&root=classpath&view=markup

Is there a bug somewhere? And where should I look for it?

Marcus

Reply to:

Follow-Ups:
- Re: Endianness in UTF-16 encoding
  - From: Tom Tromey <tromey@redhat.com>

Prev by Date: Re: Conditional disabling of unused Ant targets
Next by Date: Re: Endianness in UTF-16 encoding
Previous by thread: Re: status of jaxme
Next by thread: Re: Endianness in UTF-16 encoding
Index(es):
- Date
- Thread