[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#865713: Declaring a charset of UTF-8 for policy files



On Sat, 2017-06-24 at 20:07 -0700, Paul Hardy wrote:

> Three possibilities seem to exist, and I am fine with any one being chosen:
> 
> 1) Use the UTF-8 signature in UTF-8 text files

If this triggers browsers to use the right encoding, it seems
reasonable to add it in the situation where the files could be served
by any web server on the Internet. Right now all the mirrors of
www.debian.org are on Debian-controlled servers though, but there are
many non-UTF-8 text files so using the UTF-8 signature seems better.

> 2) Set the HTTP headers for charset="UTF-8"

FYI, there are 1018 non-UTF-8 out of 2605 total *.txt files on the
Debian website and 9 non-UTF-8 out of 1102 total *.txt files in the
Debian archive mirrors. It seems feasible to convert the files in the
Debian archive to UTF-8 but it doesn't seem to be feasible to do that
for www.debian.org.

pabs@mirror-anu:/srv/static.debian.org/mirrors/www.debian.org/cur$ find -iname '*.txt' | wc -l
2605
pabs@mirror-anu:/srv/static.debian.org/mirrors/www.debian.org/cur$ find -iname *.txt -print0 | xargs -0 isutf8  | wc -l
1018
pabs@mirror-anu:/srv/mirrors/debian$ find -iname '*.txt' | wc -l
1102
pabs@mirror-anu:/srv/mirrors/debian$ find -iname '*.txt' -print0 | xargs -0 isutf8  | wc -l
9

> 3) Convert UTF-8 text files to HTML documents for web display

Sounds like this is already done.

-- 
bye,
pabs

https://wiki.debian.org/PaulWise

Attachment: signature.asc
Description: This is a digitally signed message part


Reply to: