On Thu, Jul 08, 2004 at 02:25:08PM -0500, Branden Robinson wrote: > Thanks! I have reviewed the patch, and while compound text encoding is a > bit beyond me, I do appreciate the heads-up. :) Thanks to you and all the DDs :) > > I notice you are the author of this fix. Because of problems with > XFree86's recent change in licensing policy[1], I'd like to be certain I > know what the provenance of your patch is. > > Can you confirm the following statements? Yes, I can. :) > > * I am the author of this patch. Yes, I'm the only author of this patch. The bug report on http://bugs.xfree86.org/show_bug.cgi?id=1362 has detailed how I found and fixed this bug[2]. My project `mule-gbk' mentioned in the report may be available on Sourceforge, later. One can find a very old version of mule-gbk from http://lists.debian.org/debian-chinese-big5/2002/04/msg00013.html. > > * If any copyright attaches to this patch, I hereby place it under the > traditional MIT/X11 license[2]. If any copyright attaches to this patch, I hereby place it under the traditional MIT/X11 license[1]. [1] Here's a copy of the license text: Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. [2] Here's a copy of the original bug report: ---------------8<--------------- GBK <-> COMPOUND_TEXT translation in XFree86 is incorrect. I started a project `mule-gbk' which aims to enable Chinese GBK encoding support(GBK support is important to the people from People's Republic of China.) on Emacs21.3/Mule, a few year ago. In the process of enabling X selection between Emacs21 and other applications on X11, I found the bug. Normal X11 applications do GBK <-> COMPOUND_TEXT translation in Inter-Client Communication of X Selection with each others use the routines from the xlib. But Emacs/Mule's COMPOUND_TEXT translation is implemented in Emacs Lisp. The point is, if they(Mule & xlib) both encode GBK into COMPOUND_TEXT correctly, there was no difficulty in the ICC of X Selection. But the experiments shows that Emacs/Mule can't understand the ctext translated from GBK text by normal X11 apps, like gedit, mozilla, crxvt, etc. When you paste GBK text form these apps to Emacs, the breakon sequence appeares "...GBK-0...". Note that my locale is set to zh_CN.GBK by export LANG=zh_CN export LC_ALL=zh_CN.GBK and the locale `zh_CN.GBK' has been generated on my Debian GNU/Linux box by dpkg-reconfigure locales The version of my XFree86 is 4.3.0. Because it's so boring to me, I started to analyze the message from the normal X11 apps, by inserting debugging statements into the clipboard program `xclip'. I found ctext from the normal X11 apps contains redundant sequences, it also makes wrong value of the character counter in the `extended segments' of the ctext. According to the document `Compound Text Encoding': http://www.xfree86.org/current/ctext.pdf ,---- | 6. Non-Standard Character Set Encodings | | Character set encodings that are not in the list of approved | standard encodings can be included using ``extended seg- | ments''. An extended segment begins with one of the follow- | ing sequences: | | 01/11 02/05 02/15 03/00 M L variable number of octets per character | 01/11 02/05 02/15 03/01 M L 1 octet per character | 01/11 02/05 02/15 03/02 M L 2 octets per character | 01/11 02/05 02/15 03/03 M L 3 octets per character | 01/11 02/05 02/15 03/04 M L 4 octets per character | | [This uses the ``other coding system'' of ISO 2022, using | private Final characters.] | | The ``M'' and ``L'' octets represent a 14-bit unsigned value | giving the number of octets that appear in the remainder of | the segment. The number is computed as ((M - 128) * 128) + | (L - 128). The most significant bit M and L are always set | to one. The remainder of the segment consists of two parts, | the name of the character set encoding and the actual text. | The name of the encoding comes first and is separated from | the text by the octet 00/02 (STX, START OF TEXT). Note that | the length defined by M and L includes the encoding name and | separator. `---- extended segment in ctext for GBK text is defined as 01/11 02/05 02/15 03/02 M L , because GBK is a non-standard character set with 2 octets per character. Now, I found a simple method to solve this problem on my Debian GNU/Linux Sid by modifying a line in the system file of XFree86: /usr/X11R6/lib/X11/locale/zh_CN.gbk/XLC_LOCALE The line: ct_encoding GBK-0:GLGR:\x1b\x25\x2f\x32\x80\x88\x47\x42\x4b\x2d\x30\x02 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ (These are 128,128+8,'G','B','K','-','0', 2, where "GBK-0" may be the character name for GBK. It's so strange question how they goes here? May be due to the misunderstanding of the author to `Compound Text Encoding'??) should be changed into(equivalently, remove these 8 octets): ct_encoding GBK-0:GLGR:\x1b\x25\x2f\x32 ~~~~~~~~~~~~~~~~ (This is exactly the first 4 octets of the extended sequences defined in `Compound Text Encoding'.) Till now, this method has been used by many mule-gbk users from P.R.C. How ever, I don't know the explicit meaning of this line, maybe an Xpert can figure out :( I have download xc/nls/XLC_LOCALE/zh_CN.gbk from http://cvsweb.xfree86.org/cvsweb/*checkout*/xc/nls/XLC_LOCALE/zh_CN.gbk?rev=HEAD&only_with_tag=xf-4_4_99_4&content-type=text/plain (This file is untouched for 3 years), and made a patch for it: *** zh_CN.gbk.orig 2004-05-06 23:33:06.000000000 +0800 --- zh_CN.gbk 2004-05-06 23:34:31.000000000 +0800 *************** *** 62,68 **** byte2 \x40,\x7e;\x80,\xfe wc_encoding \x00008000 ! ct_encoding GBK-0:GLGR:\x1b\x25\x2f\x32\x80\x88\x47\x42\x4b\x2d\x30\x02 mb_conversion [\x8140,\xfefe]->\x0140 ct_conversion [\x0140,\x7efe]->\x8140 --- 62,68 ---- byte2 \x40,\x7e;\x80,\xfe wc_encoding \x00008000 ! ct_encoding GBK-0:GLGR:\x1b\x25\x2f\x32 mb_conversion [\x8140,\xfefe]->\x0140 ct_conversion [\x0140,\x7efe]->\x8140 SU Yong ---------------8<--------------- -- SU Yong <yoyosu@ustc.edu.cn> Proud Debian/GNU Linux User PGP-Key-ID: 584F35F3
Attachment:
signature.asc
Description: Digital signature