[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#865713: Please Start UTF-8 debian-policy Text Files with UTF-8 Signature



Package: debian-policy
Version: 4.0.0.2
Severity: minor
Tags: patch
Justification: garbled display (mojibake) in web browsers

Dear debian-policy Maintainers,

There are numerous non-breaking space characters (U+00A0) in:

https://www.debian.org/doc/packaging-manuals/upgrading-checklist.txt

These are encoded as UTF-8, which is 0xC2 0xA0, or octal \302 \240.
The file does not begin with the UTF-8 signature though, so web
browsers might not properly display the UTF-8 characters.  This is
true of the version of Firefox that is in the current Stretch stable
release.

Those non-breaking space characters actually occur within a very short
title line, so they could be changed to plain old spaces with no
side-effects.

That might not be the only UTF-8 that appears in such files someday
though, so a more general solution would be to start the file with the
UTF-8 signature, aka the Byte Order Mark (BOM).  This is the UTF-8
encoding of U+FEFF, which is 0xEF 0xBB 0xBF or octal \357 \273 \277.
Then a web browser should display UTF-8 characters within the text
file properly.

This sed command will prepend this symbol to a file, modifying the
file in place:

sed -i '1s/^/\o357\o273\o277/' upgrading-checklist.txt

Alternatively, this awk script would insert the same three-byte
sequence but will not edit the file in place:

awk 'BEGIN {printf ("\357\273\277");} {print;}' < original.txt >
utf8-version.txt

Thanks,


Paul Hardy


Reply to: