Re: Allegedly "open source" fonts and the DFSG

On Sat, Aug 10, 2024 at 5:39 PM pipcet@protonmail.com wrote:

(Quick caveat: I'm going to use the term OpenType, exclusively, when referring to the font format / specification, because TrueType and OpenType are the same format and we need to get past that misunderstanding. Others out there may disagree, but my mind is made up on the matter.)

The Debian project contains TrueType fonts without sources. The "source"
packages provide .ttf files which contain binary information compiled
from secret sources.

This is not about the license the .ttf files are made available under;
it's about whether these distributions include source code, as
required by the DFSG and the OSI Open Source definition.

I guess I'm not confident that I understand what point you're trying to get at with these font examples. But, in case it helps from the standpoint of font formats, I would personally push back on some of the particulars in this first part of the email, since I don't think it frames the contents and editability of OpenType fonts entirely accurately. The format itself is not human-readable, but it's far more akin to a media format than to an executable in my opinion, and that distinction often seems to quickly get lost in a very broad "where's the source" discussion.

Whether or not an OpenType file constitutes 'code' is going to depend on assumptions people bring with them. Is "a list of coordinate points" code? Or is that data? Is it code if it's labeled as points that constitute a path to be drawn? That's what we're talking about with glyphs in OpenType.

The rubber-meets-the-road reality of the OpenType format is that the font files themselves are, in essence, source code, if they're code at all rather than data. Namely, they are instructions that must be run through an interpreter; they do not execute themselves, they don't get control of the processor, they don't access the system or any of that stuff.

If we're measuring by weight, the bulk of them are path instructions: curve segments to be drawn, advances to move the pen to, etc. For that reason, it's typically not much of a loss of information if the path data was previously in some other format, because the OpenType format is editable as-is.

I'd fully agree that it'd be fantastic (and life would be easier) if many of the other tables and strings inside an OpenType font were in an easier-to-process format (metadata in modern multimedia files for example....), but the format exists the way it does mainly because it was optimized for size reasons back in the bad old days of printer memory and so on. But OpenType files are editable and examinable (and diffable and roundtrippable, as Felipe noted...) with free software, despite their not being human-readable on-disk.

Because of that, standard practice has for a long time been to regard the Beziers and the compact representations of all the other junk in an OpenType font file as, so to speak, usable enough. Not _fun to use_ necessarily, but then again what is?

Fonts are, of course, nontrivial computer programs. In addition
(usually, see below) to the glyph outlines, which can be retrieved
from the TTF files, they contain a large amount of code and additional
data. For example, the "fpgm" table may specify a sub-program for
which source code is not available.

This is slightly different, enough to comment on IMHO.

The fpgm table contains TrueType Instructions (more commonly referred to as "hints", but again the choice of terminology kinda gets on my nerves), which are more or less an assembly language. It has a very limited instruction set, affecting the path data that is later to be interpreted. So there is every likelihood that there was never any other source that was compiled into the TrueType Instructions as delivered.

It seems clear that there have been tools that output TrueType Instructions as a way of simplifying or making families / libraries of vendor fonts more consistent and easier to maintain (Microsoft's Visual TrueType is one of the only ones still around), but as shocking as it seems, people edited those by hand.... In those cases, there may be no other source.

The format for the assembly language is documented, of course: https://developer.apple.com/fonts/TrueType-Reference-Manual/RM03/Chap3.html

Here, too, because the instructions it contains are only ever run on the interpreter in the font stack (e.g., FreeType), it's a bit different than the case of a binary blob that can do something in its own process. I guess I don't know how it compares to the case of binary-blob firmware; I suppose others know that better than I.

But there are some free-software tools that can decompile the format, and I'm not entirely sure if doing that would get you something reasonable un-like the hypothetical "original source" used by the foundry. It might be that there are pre-processors and things like macros that are not available that way, but I don't know if it's been studied. And you can't rule out that it might just have been done by hand.

Some fonts even specify entire font families as "variable font"
programs which take various parameters, so the outlines are very
little help at all since they vary depending on the parameters, in
non-trivial, non-modifiable ways hidden in the binary files.

I don't think I can call this statement accurate. OpenType font variations are well-defined; the axes of each variable font are defined in the `fvar` table, and the alterations made to each glyph are (X,Y) deltas on the points in the contours, nothing more (*).

[* technically, other things are also subject to deltas, such as the metrics, and it's possible for a variable font to declare that at some axis position, a specific glyph from one master is replaced by a specific glyph from another master — the canonical example being a dollar sign with bars that go through the middle being swapped out for one where the bars are only shown above & below, in a super-heavy weight where retaining the bars in the middle would fill it in entirely.]

It's most certainly not arbitrary code, its meaning is not opaque, and it's not _any_ less modifiable than the non-variable path data in a non-variable OpenType font. The outlines are a great deal more than "very little help"; they're in fact the entire game.

In particular, this affects at least the following fonts:

Noto CJK: in this case, something *closer* to the source is available
from Adobe's GitHub pages
(https://github.com/adobe-fonts/source-han-sans) but even that font
(a Type 1 PostScript font packed in a CID) was produced by a
"proprietary application"
(https://blog.typekit.com/2014/08/14/interview-with-ryoko-nishizuka/) I
believe it's highly likely this proprietary tool consumed additional
source data which is not available.

If the upstream project is using a proprietary tool, then my take on the situation would be that it becomes problematic iff that tool produces output that the free-software tools cannot generate equivalent output for. I'm not clear if that's the case here, since I only have a bare-bones understanding of Hangul and basically zero for the other writing systems used in CJK fonts.

Noto Emoji Color: the GitHub repository indicates
(https://github.com/adobe-fonts/noto-emoji-svg?tab=readme-ov-file#generating-png-and-svg-files)
that there are Adobe Illustrator files which constitute "the original
Ai artwork". These files, which may include valuable information, are
not included, only SVGs and PNGs generated from them.

I guess I would agree that the best case scenario would be if there were Illustrator files (if those were what was originally used), but SVG itself is a source format and I would call it the preferred format for editing; it can not only be edited by GUI tools like Inkscape, but also generated programmatically. I'm suspicious that there would be valuable data in the Illustrator format of something made to be exported to SVG. Meaning, things that do *not* end up in the exported SVG file, but were valuable to exporting/generating it. Possibly metadata, but I'm not aware of things that would be exported to SVG by Illustrator but couldn't be generated otherwise.

Droid Sans Mono: the font includes a non-trivial "fpgm" table which
changes the appearance of some glyphs. No source code or instruction
for rebuilding this table has been made available.

I think you would need to do some investigating of the TrueType Instructions actually in the table in order to determine if it could be modified in a way that lets you alter it. Like I was saying above, as an instruction set it's pretty small, so you could do that. I don't know that reverse-compiling it into some other format would leave you with something different (much less, something less complete) than whatever might have been used originally, but it isn't opaque enough that you would wonder what it's doing; you could recreate it or modify it with instructions of your own.

1. What is the source code for fonts? Is there some argument that .ttf
files, by some process, become the source code even when they're
generated from other sources?

I suppose the above might sort of provide my personal perspective on this question. Fonts are half-data, half-interpreter-code; the interpretable stuff does not have a lot of complexity to it (in programming-language terms), so it's easy enough to verify what code-ish parts do, when that needs doing, and those parts are easily edited.

I don't have any opinion on questions 2 or 3.

Anyway, I hope that's at least useful to think about.

Nate

nathan.p.willis
nwillis@glyphography.com