[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: HTML mail + PDF attachments (with șurubelniță)



On Fri 27 Mar 2020 at 12:42:17 (-0400), Greg Wooledge wrote:
> On Fri, Mar 27, 2020 at 11:12:49AM -0500, David Wright wrote:
> > On Fri 27 Mar 2020 at 17:35:19 (+0300), Reco wrote:
> > > I'm not that familiar with the languages to qualify Romanian as a Latin
> > > or a non-Latin language,
> > 
> > I think we can agree that the Romans spoke Latin!
> 
> Err... Romanian is not the language that was spoken in Rome.

I don't think anyone has said that it was. The Romans (in the sense
that I think we're also agreed on, ie those living in Rome a couple of
millennia ago) spoke Latin. As deloptes has pointed out, they didn't
speak the literary language, just as most English people don't speak
literary English. They spoke Vulgar Latin, and that outlived the
Empire, evolving and dividing into the Romance languages.

But it's strange how an aside can kill the actual discussion of
email headers.

> It is, however, considered a "Romance language", meaning it has Latin
> as one of its primary roots.
> 
> That said, I think the actual question was which character set it
> requires.

The Romanian language only came up because Andrei used it to write an
example of a filename. AIUI what's important in deciding whether to
encode it is the type of Latin, ie Latin-1 vs anything else. Pointing
out that it's been encoded here in Unicode is obvious (it's prefixed
by utf-8), but AFAICT it could have been encoded in Latin-10 just as
well. (Andrei would have to adjudicate on Latin-10's suitability for
the 1st and 10th letters, which have comma below. I think Latin-2
has the cedilla form, and Andrei's utf-8 version avoided that.)

(BTW I'm not sure about Reco's use of \uc899. Does \u mean that
c899 is in utf-8, or should it be followed by a Unicode codepoint,
as in U+c899? If the latter, then \uc899 is way off my charts.)

However, the actual problem that Russell introduced was how a
character set—any character set—should be encoded in the email header
parameter's value. And the RFC answer is "not in Base64", which is for
unstructured fields, as illustrated by the header of my previous post.
Mutt, as expected, writes conformant values but can be instructed to
decode particular non-conformant ones.

> According to
> <https://en.wikipedia.org/wiki/Romanian_alphabet#Digital_typography>,
> it originally used ISO 8859-2.

… aka Latin-2. Not being Romanian, I can't comment on the relative
popularity of that and Latin-10 (ISO 8859-16) or whether the latter
was still-born. And…

> Of course, it would probably use UTF-8
> on most modern systems.

Yes, that also seems more up-to-date and expressive, with support for
distinguishing obscure (to me) variants like cedilla vs comma below.

Cheers,
David.


Reply to: