[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: wchar_t , when should one use ?



Deepak Kotian wrote:

> When should one use wchar and when should one use char. I have heard
> lot about UNICODE. But what is the real need of wchar_t type ?

The first thing to understand is that Unicode can be encoded in a few
different ways (and for the most part, it is easy to transform one
encoding into another by bit shifting).

Fundamentally, Unicode consists of 32-bit characters. But you really
don't want to encode all your text files that way, because it will make
them four times larger (obviously). Therefore, there are some clever
encodings that allow you to encode the full range of 32-bit characters
in variable-length sequences of 8-bit or 16-bit values. There is also a
subset of Unicode called UCS-2 which, IIRC, simply limits the character
size to 16 bits and cannot express Unicode characters that require more
than that. UCS-2 is the native text encoding in both Java and Windows
NT.

So, now to finally answer your question, you use char for character
encodings based on 8-bit values (either singleton, as in traditional
ASCII, or variable-length sequences). Unicode's UTF-8 encoding is one
example of such an encoding. You use wchar_t, on the other hand, for
encodings based on 16-bit values. Unicode's UCS-2 and UTF-16 encodings
are of this type.

> Actually, I have many read many files through a C program and it has
> Japanese text as well. What is advisable to use. Should it be fgetws
> or fgets, will also do. When should one use fgetws or fgets and
> reason.

That depends how the text is to be encoded.

> Please let me know, if someome can explain in simple terms or any
> document on this would be helpful.

See http://www.unicode.org. Lots of documentation there.

> Moreover, windows has wsystem(), wstat(),etc, which LINUX does not
> have, any reasons for that.

Microsoft extends the C standard library with wide-character equivalents
for most functions that take or return strings. This is intended to make
it more convenient to write native Windows NT programs that always use
wchar_t for text.

Linux itself doesn't have any C functions; it's just the kernel. You're
probably thinking of the GNU C library. I have no idea what the GNU C
library's Unicode support is like. If it lacks those functions, it's
probably because they are non-standard and there has been no great
demand for them from users of that library.

Craig

Attachment: pgpZAfvuqFMpn.pgp
Description: PGP signature


Reply to: