Memory savings
Executive summary: I've shaved somewhere around 20MB off d-i's memory
use in a netboot test. Share and enjoy.
Firstly, I fixed a number of reference-counting bugs and other memory
leaks in cdebconf. Most notably, the process of loading a templates file
didn't properly free each parsed RFC822 stanza, so leaked memory roughly
equivalent to the size of the templates database at boot time (about
2.5MB in the netboot case). I've tested this quite extensively, but
there's always the possibility that in correcting memory leaks I went
too far and freed memory that was being used or created some other
similar bug. Please let me know if you see any weirdness in cdebconf
that looks like corrupted memory.
Secondly, I made a change to the way translations are handled. The core
observation is that cdebconf doesn't really need to store all the
translations for inactive languages in memory: all it needs is English
(well, C in the general case, but it's a lot simpler to read both C and
English throughout) and the currently-selected language. The reason that
it hasn't skipped these up to now is that, in order to save the
templates database without losing data, it needs to have read everything
into memory.
Various people noted that it would be OK if we didn't support changing
the language after anna has run; that's far enough through the installer
that it's an edge case. anna also happens to be the first time that the
templates database is saved (at least while dirty, so that it actually
gets written out) after startup. This suggests a somewhat cheesy hack,
which I've implemented: we add a reload method to the templates database
implementation to allow it to reload the database and replace localised
strings in memory with those from the filesystem, and call that method
each time the language is changed. It's not especially pretty, but it
does work. If you change the language before anna runs, you'll still get
correct translations thenceforth; once anna runs, the translations you
aren't using will be irreversibly forgotten.
The result is a memory saving of a good part of the prior final size of
the templates file times two (once for the copies no longer held in
memory, and once for the reduction in the final size of
/var/lib/cdebconf/templates.dat since that's on a tmpfs). This comes to
around 18MB in my tests.
Again, I've tested this as best I can, but there may be corner cases in
terms of changing the language or whatever that I missed. Please let me
know if translations inexplicably go missing.
Somebody (architecture maintainers?) should update lowmem for all of
this.
Could we do better than this? Yes, we could. The idea of having cdebconf
mmap its templates database has been around for a while, and I discussed
this a year or two back in #329743. However, on reflection I think it's
going to be hard to do in rfc822db; the assumption of null-terminated
strings is just too deeply embedded and it's probably harmful to code
maintainability to try to extract it.
However, it might be possible to design a new binary database format
(let's call it mmapdb) that had its strings null-terminated right there
in the file format for ease of mmapping. If properly designed, such a
format could be smaller and quicker to load and save as well, which is
becoming a concern for the templates database (it can easily take
upwards of a second to save, and we already have several measures in
place to avoid unnecessary saves). Here's a strawman pseudocode proposal
for the format:
  enum field_id {
    name = 1,
    other = 2,
    /* question fields */
    value = 10,
    flags = 11,
    owners = 12,
    variables = 13,
    template = 14,
    /* template fields */
    type = 20,
    default = 21,
    choices = 22,
    indices = 23,
    description = 24,
    extended_description = 25,
  };
  struct field {
    enum field_id id;
    unsigned int language; /* reference into database.languages */
    char value[]; /* null-terminated */
  };
  struct item {
    unsigned int n_fields;
    struct field fields[];
  };
  struct database {
    unsigned int n_languages;
    char languages[][]; /* null-terminated, packed sequentially */
    /* "" indicates the null language, e.g. Description: */
    unsigned int n_items;
    struct item items[];
  };
The memory cost here would be one pointer per language, field, and item,
which is around 256KB for a current typical templates database after
anna runs. We could decrease that to more like 10KB using the same trick
of forgetting the pointers to fields in unused languages, but I'm not
sure it's worth the bother.
(This proposal should be sufficient for the questions database as well
as the templates database, but there's no reason why the same database
format needs to be used for both, and I'd be inclined to suggest
sticking with rfc822db for the questions database since it's easier to
read.)
One concern with doing this is that the templates database would no
longer be readable by 'debconf-get-selections --installer' after
installation. To avoid this problem, I would suggest using
debconf-copydb to copy the templates database to /target so that it can
be converted to the rfc822db format at the same time.
Whether any of this is worth the effort is debatable. At this point, the
templates database after anna runs is about 300KB, and cdebconf is only
going to be using about that much memory for it. It would only be worth
it to save memory on the installed system too, where we can't make the
same assumptions about dropping translations, or if the inability to
change languages after anna becomes a problem. As such, I'm going to
close bug #329743 with this change.
Thoughts?
Cheers,
-- 
Colin Watson                                       [cjwatson@debian.org]
Reply to: