[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: live-manual update



On Sun, 2020-04-05 at 21:33 +0200, Roland Clobus wrote:
> Thank you for your efforts to make the live-manual better.
> I'm adding the mailing list to the list of recipients, because the
> work
> is done by the live team (of which I'm not a member, just a person
> with
> interest in the project) and the team can decide on the future
> direction
> of the project.

Yes, I'm aware. I was just curious about the nature of the update to
live-manual you were tackling.

I was not ready to initiate a proper discussion/proposal of format
changes, which would have occurred openly later if I was still then
interested in actually carrying it out. I just wanted at this time to
merely gain an understanding of whether this was something you were
already doing, and if not then some idea of when perhaps your work
might be available to build such a change upon.

> > I believe you were the person who spoke of being in the middle of a
> > big> update to live-manual?
> Well big... I am working (with the little time that I have) on
> updating
> the live-manual, primarily such that all references to Alioth get
> removed. While doing so, I got to know the live building process and
> updated parts of the manual that got out-of-date over time.

Yes, I understood that your work essentially was focussed on bringing
it up to date and my impression was that this was more than just a few
minor tweaks, especially after noticing previously a question being
asked about live-wrapper which the manual does not currently cover at
all, and since if it was a very minor change you'd have probably
submitted it by now.

(Though of course the response indicated that live-wrapper is
effectively dead so no longer needs covering anyway...)

Alioth references - Aside from translation files, which I've ignored,
all I can find is three old links in searching for "alioth"; one of
which I missed in my previous "live-systems" update, I've just
submitted an MR for that, the other two I'd previously spotted but not
known what to do with only require minor corrections. So I don't
understand what it is you're actually doing with respect to "removing
alioth references" if it's not just that... (not that I need to know)
(and not that I've really looked at the manual in quite some time).
Perhaps you could publish a preview (WIP) copy of your changes so far,
even if it's currently an incomplete mess? (I looked at your salsa repo
on Saturday but saw no sign of such work, so you must have it locally
only I suppose).

btw, I've noticed that the manual is currently missing discussion of
injecting environment variables via the user config, if fixing that
fits within your scope of improvements and want to tackle it. It would
be good to have that covered. I can provide some details if needed.

> > Does your work involve at all:

> >  b. any change to the markup language and respective "build" tool
> > used
> > for "building" the documentation (generation of HTML pages and
> > PDFs).
> 
> No, the markup language SiSU is not a common markup language, but for
> me
> it suits its purpose.

ok.

> > I ask because I made a relatively small change the other day to the
> > coding style section (submitted an MR which is pending review);
> 
> I assume you are referring to
> https://salsa.debian.org/live-team/live-manual/-/merge_requests/22

yes.

> > The build tool SiSU was my biggest gripe. I was not impressed with
> > the
> > hundreds of megabytes of dozens of packages required for sisu-
> > complete
> > installation, that it seemed to be generating a postgresql database
> > on
> > installation, that it pulls in ruby and such, and the manual having
> > to
> > give advice on speeding up a supposedly very slow build process (I
> > followed the tips to limit the scope of building, which was
> > reasonably
> > quick, I did not explore how slow the worst case supposedly is).
> 
> As far as I can see, SiSU was a nice tool at the time the live-manual
> was started. It apparently didn't catch on, as nowadays there are
> hardly
> any reverse-(build)dependencies on SiSU.

Yes, I believe that live-manual is the only package build depending on
sisu-complete, and unless I've made a mistake in how I've searched the
archive, it is the only package in the entire archive actually using
SiSU.

A good few dozen on the other hand depend upon markdown or pandoc.

Hell, I just looked at the SiSU website and found that all of their
source code links to be dead (everything at http://git.sisudoc.org: 
http://git.sisudoc.org/software/sisu; 
http://git.sisudoc.org/git/code/sisu.git; 
http://git.sisudoc.org/gitweb/?p=code/sisu.git;a=summary) which is not
at all a good sign, not a state expected of a healthy project.

> However, SiSU allows the author to focus on the content without being
> bothered by layout decisions and markup.

I don't exactly agree, and I do not think that the separation is a
significant concern here.

>From what little of SiSU I know (primarily from noting the difference
in ssi vs. ssm files), yes, SiSU separates content from layout, but it
certainly does not stop authors being bothered by markup. Surely you
did not mean to suggest that? The SSI content files are riddled with
SiSU markup artefacts, as to be expected in anything that is not just
plaintext. You cannot escape from having to deal with markup of one
form or another unless writing only plaintext.

Furthermore the uncommon nature of SiSU markup is itself a hindrance to
authoring changes. I only just about got by without having to find SiSU
documentation by copying what I was seeing elsewhere in the files.
Markdown/commonmark and HTML on the other hand are widely understood.

The layout components of live-manual are _very_ minimal. I would not
expect anyone to reasonably consider them to be getting in the way of
authoring changes if we were to move to one of the two proposed
alternate formats.

If we consider markdown as an alternate to SiSU, then essentially all
of the ssi files will translate to equivalent markdown/commonmark
markup. They could perhaps work stand alone without any template, with
the all-in-one version being created from a simple bit of script mostly
just appending each page into a single file.

For HTML as an alternative, you'd have a relatively small block of 
HTML before and after the chunk that is the content. The HTML within
the content would largely just be equivalent markup, with the major
different just being <p></p> tags surrounding paragraphs and similar
for headings. Or, if really wanting to separate out the starting and
ending HTML from the chunk that is the content, you could have that in
a "template" HTML file that has a placeholder for the content, and then
have the build script inject the file content of pages using the `sed`
tool or similar. You thus split out this small amount of non-content
HTML, at the expense of having to "build" pages to "test"/review
changes in their non-source form. Leaving it in each file though I
really don't consider worthy of being said to be a bother to authors.

If ever wanting to modify this starting/ending portion of HTML though,
having a template has benefits over having to change multiple files and
across multiple translations, but how often are changes ever going to
be made that this becomes a concern? Is it really enough of a concern
to make templating worthwhile vs. the trade off of needing to "build"
pages.

Generating the all-in-one copy of the documentation can quite trivially
be achieved from both markdown and HTML forms, in theory, you're just
essentially copying all of the individual pages into one file
effectively. If using HTML and no template, then it's still a trivial
task to extract the content portion to do this, so that's no bother.

Another factor is that my editor of choice - gedit - has syntax
highlighting for many formats, but this does not include SiSU.

> For the translators, the translation framework is also present, using
> the well-known po files.

I would argue that for documentation files like the manual, having a
translation framework is over the top and inappropriate.

The po translation system is well suited to code files because it
separates the text strings from the mass of code they are sparsely
located within, allowing the translator to focus on the data they need
- the strings. It also makes sense considering the use of the
translations - the translations are not going into translation specific
binaries, they are just translating the string data, for which a single
common compiled binary is to make use of when it dynamically loads the
right set of language strings at runtime (except when just using the
embedded English of course with the gettext system). Compare this to
the manual, where a different copy is made for each language, and the
content of relevance to translations forms almost the entirety of the
source for generating "built" artifacts, thus bringing into question
the model of trying to split the content out in the same way, when it
might be must more simple to be able to just generate them directly
from the files containing translated content, if only permitted to
contain a tiny portion of layout if necessary.

Of course PO is not perfect. You may be interested by Project Fluent,
if you've not come across it before: https://projectfluent.org/

With regard to the manual, breaking up the content of the pages in
terms of individual headings, bullet points, paragraphs, etc, in POT/PO
files, really does not add much over just having a copy of the original
files, very little of which is not pure content.

Compare about_manual.ssi and about_manual.ssi.pot. There is no benefit
to PO here in terms of focussing on the job at hand; all of the small
bits of markup within sentences/paragraphs are carried across, even
things like "code{" get grabbed as possibly translatable strings.

Compare user_basics.ssi and user_basics.ssi.pot. The first thing in
there is "code{", which, since the PO file is trying to avoid duplicate
copies of, means that it appears nowhere else in the PO file, meaning
that you cannot just go through the PO file, translating, understanding
context as perfectly as with the original. PO thus can potentially even
be a hindrance to translation of these files.

For instance, take this string from the POT: " # apt-get install
xorriso\n". It is not necessarily immediately obvious to a translator
that this is a shell command and that the word "install" if not all
words it contains should not be translated. The surrounding code block
markers having been removed potentially damages understanding of
context.

Of course I'm forgetting thus far that po files have each original
string alongside each translation string as an "ID", which may be very
helpful for comparison. An alternate that might work perfectly well for
these documentation files might be for translators to view the original
and translation side by side, or in a split 2-file view in an editor.

Of course perhaps po4a actually works with HTML and markdown files, I'm
not sure. I just did a brief google search and did not get a clear
answer. If so then the issue of moving away from po4a is rather
redundant is it not, if it is considered worth keeping for the
translation workflow, compared to side-by-side / split-dual-view
comparison based work as just described. Since, as I repeat, these
files are almost entirely pure content, and po damages the view of that
actual content in cases and thus damages understanding important
context.

The other issue of course is translators identifying portions of text
that have changed where updating translation is necessary. If keeping
po then this aspect is irrelevant. If moving away from it, then
translators would either reply upon side-by-side / split-dual-view
comparison, going through line by line; review the commits to the repo
and update following the changes made in them; or get a diff of the
English version from when the translation was last updated up to
present, and work from that.

> > Just at the minute I'm taking a brief pause in my other work to
> > consider the possibility of whether there is a better, more modern,
> > lightweight, etc tool that could replace use of SiSU. Something
> > markdown/commonmark based perhaps. Or maybe if we just used plain
> > HTML
> > directly, with a conversion tool to produce PDFs (if we want to be
> > able
> > to make them).
> > 
> > I don't think we need care about the database backed "search"
> > mechanism
> > SiSU provides.
> 
> Back when Alioth was still running, the database backend functioned
> as a
> kind of search engine.

I'm not sure that I follow.

I presumed that the SiSU search component was a feature of generating a
search facility for searching within the content of a SiSU formatted
set of documents. So the database generates and stores a pre-compiled
set of data with which to respond to searches with. I am familiar with
a similar thing being a part of documentation generated for Rust
projects via `cargo doc`, only I believe it just compiles information
into javascript, rather than use a DBMS. I do not consider such a
search facility to be of value to the live-build manual. It does not
consist of many pages and you can easily (a) use the in-page search
facility of your web browser either in multi-page or single-page view
(b) use the search facility in your PDF viewer if using the PDF (c) use
search engine to find the right page for multi page view.

> > And I expect we don't need the same translation stuff we
> > use for code; translations can just be copies of the english files,
> > translated...
> 
> I disagree here. Having a standard translation framework in place is
> important to me. For me, translated files should take over the
> structure
> and as much of the markup from the original file as possible, and
> preferably only replace the English content.

Again, as above, _very_ little of the content of the files, now and
under the alternate formats, is not content or markup that naturally
and already finds its way into the translation files. I do not expect
any notable difference under markdown/HTML if continuing to use po.

> > Anyway, I just wanted to know whether you were already working on
> > anything in this area or just rewriting the text. Also I was
> > wondering
> > how near completion you might be, as I'd not want to start any
> > effort
> > (if I was to do it myself) to convert to a different markup and
> > such
> > until your substantial rewrite was available to do it on, otherwise
> > it
> > would just create a lot of extra work for one of us to rework
> > things on
> > top of the other's changes obviously.
> 
> I would rather ask whether any of the team members would be willing
> to
> do a review if all files will be touched.

As in thoroughly compare old and new side by side word for word?

Personally I would not be overly concerned about review, provided a
discussion and agreement takes place on the direction of the work
beforehand regarding selection of format, and use of templating and
translation. Considering all the benefits, not least moving away from
relying upon an essentially obsolete build tool, then I think there's
good chance they'd agree.

In terms of what the reviewer might do in their review of such a
change, I would not necessarily expect them to bother to look all that
closely at such a change, at least if it were me submitting it rather
than someone new here. I have to some degree built up a working
relationship with Raphaël and Luca. I do not know whether that
relationship will ever extend to team membership, but I've been working
with them over the past few weeks getting a substantial amount of work
merged into live-build. Furthermore (1) we're just talking about chunks
of documentation here, not code where bugs including security
vulnerabilities can be easily introduced by the smallest flaw
(unintentional or deliberate), (2) it's not like they'd expect it
likely that I, having established a good working relationship with
them, would suddenly jeopardise that by playing some prank submitting a
vandalised work, or that it likely that I'd screw things up royally
performing such a format conversion, such as loosing a paragraph.

I would thus not expect them to be spending time doing a word by word
comparison. I'd expect little more than a cursory look at how I'd
implemented the new format, and a quick check that the
build/translation functionality works as before; trusting me that the
content is unchanged.

Of course I do appreciate that they already have their work cut out
reviewing all of my live-build contribution work as it is. But then
this would not necessarily need to get reviewed and merged immediately,
it could sit in salsa pending merge for a little while until their time
frees up enough (or they give me access and freedom to do it myself,
assuming they agree with the nature of the change in principle).

> My proposal:
> * First gather consensus from the team whether a change is needed
> * When so, decide on a solution that matches the requirements

Of course. As I said above, I was just seeking out some info on the
nature of your work at this stage, prior to making any possible
enquiry/exploration of format change at a later date.

> Let me try to summarise the current state:
> 
> As-is state:
> * The documentation is written in SiSU
> * The output is available in PDF (A4 and letter, each both in
> portrait
> and landscape), HTML, epub, odf and plain text

`markdown` can convert from markdown to HTML. I'm not sure if it can do
more.

`pandoc` can convert from markdown to HTML, PDF (using LaTeX), ODT,
docx, RTF, plaintext, EPUB, and others.

I don't have any experience of using these, they're just a couple of
tools I came across doing a brief bit of research yesterday.

So thus I do not know about portrait vs. landscape and A4 vs. letter,
though I don't see why we'd care about anything other than portrait
A4...

> * Translations in all document formats are generated for 9 languages
> (ca, de, es, fr, it, ja, pl, pt_BR, ro)

po4a may well work just fine for markdown/HTML, so no difference if so;
Just perhaps requires a bit of effort to migrate existing translations.
(If we care to, perhaps they're so out of date by now the existing
copies should just be ditched?)

> * Dead links can be found using linkchecker on the html output

We're getting HTML output in any case; Whether we go with markdown or
actual HTML, we will always want HTML output for the web hosted copy.

> * sisu-complete pulls in many packages
> * The latest build from the git branch master is available on
> https://live-team.pages.debian.net/live-manual/
> * live-manual is packaged in Debian as live-manual, live-manual-epub,
> live-manual-html, live-manual-odf and live-manual-pdf

Interesting, I actually had not noticed the existence of all of those
actual packages. I'm not quite sure whether I'd agree that this is the
ideal means of distributing the manual. I mean in terms of the PDF
form, live-manual-pdf dumps 40 files into /usr/share/doc/live-
manual/pdf/, with different orientations, and different languages, and
compressed into archives. Is that really a suitable/desirable/useful
way to distribute the manual?

> Positive notes:
> *  Sisu is well-supported with syntax highlighting

As mentioned above, no highlighting in my editor - gedit.

Being little used is not going to encourage package maintenance.

> * Proof-reading the English text is done by 'make PROOF=1', which
> takes
> only about 8 seconds on my computer.
> 
> Negative notes:
> * Sisu has a few reverse build dependencies
> * sisu-complete brings a large list of dependencies, but can be
> replaced
> by a smaller list
> 
> Personally, I see no immediate need to have this large
> transition/rewrite right now.
> A task within the live team, that I see will be more pressing, will
> be
> the generation of the standard live images (about which I shortly
> wrote
> on 2020-03-21T17:27). Current Debian Stable images are built with
> live-wrapper, which uses vmdebootstrap under the hood. vmdebootstrap
> depends on Python 2, and will not be present in the next version of
> Debian. I am not aware of something that provides a 1-to-1
> replacement
> that will work on Debian Testing (and therefore the next release of
> Debian).

If you mean bullseye, I expect that it won't be until some time next
year that that gets released, so I don't see a pressing concern for
focussing on a move away fro live-wrapper there.

Also, the live images were previously produced with live-build until
live-wrapper came into existence a few years back and they moved to
that, as far as I understand it. With live-wrapper and vmdebootstrap
being abandoned, I would expect that it should not be a big deal for
them to just move back to live-build...

Furthermore, live-manual is currently live-build focussed only, and
although live-wrapper and live-build both have the maintainer listed as
the Debian live team, effectively they have different groups of people
working on them.

Since the original author of live-build left, live-build has been in
low maintenance mode, with Raphaël Hertzog holding ownership, and
sharing upload responsibility with Luca Boccassi, as I understand it.
It is these two who have been reviewing and merging my contributions
these past few weeks. Effectively I personally have been working full
time just now on live-build improving many deficiencies in the codebase
and fixing various bugs. Though I am not on the team myself.

With live-wrapper, if you checkout debian/control and review the commit
history, it is Iain Learmonth and Steve McIntyre who are the main two
who have been developing it, and the two of them along with Jonathan
Carter who hold upload responsibility.

So although all of them are grouped under the banner of the Debian live
team, and are members of owner or maintainer status in its salsa
project, it's two different sets of people developing and maintaining
the two different tools.

Furthermore the original author of live-build was just the developer of
that tool, I do not believe he was ever a member of the Debian team
responsible for releasing official images. I've lost touch with who the
team members are, if I ever knew. Perhaps those involved in live-
wrapper belong to it? Perhaps not. I recall Steve being a notable
member of the project from past contribution work I did for Debian, but
I don't know if releasing images is one of his areas of responsibility.

So, live-manual is currently live-build oriented only. If a discussion
of expanding it to cover live-wrapper ever took place, I'm not aware of
it, and in any case it would be pointless now that live-wrapper is
looking dead. With live-manual being live-build only, and with the
live-build, live-wrapper, and debian-image-team (whatever we should
call them) being separate groups of people (or at least the live-build
people being separate), I don't expect that the effort of which you
speak regarding migrating away from live-wrapper for official images is
of any concern to the workload of those working on live-build and its
documentation in live-manual.


Reply to: