Re: "Booting Debian in 14 seconds"
Bastian Blank wrote:
On Thu, Nov 13, 2008 at 09:01:07PM +0000, Phil Endecott wrote:
I'm writing to let you know about an article that I've written for
Debian Administration about improving boot time:
Please quote the complete text here if you want to discuss it.
Booting Debian in 14 seconds
Posted by endecotp on Mon 10 Nov 2008 at 22:42
Many readers will have heard about Arjan van de Ven and Auke Kok's work
to boot an ASUS Eee 901 in 5 seconds. Inspired by this work, and
because I have the same laptop, I decided to try to reproduce their
results. So far I have not come very close to their 5 seconds, but I
have made some significant improvements compared to the default boot
time for Debian on that machine; this article describes what I've done.
Although some of what follows is specific to the Eee 901, most of it
isn't and could be applied to other laptops and PCs in general.
This article assumes that you're already familiar with things like
building kernels, applying patches and so on. The target audience is
the "advanced end user", and also the Debian developers responsible for
the packages concerned who I hope will be motivated to incorporate some
of this work.
It's worth noting that many of the things that are described here are
already making their way into the upstream sources, so the lazy reader
might decide simply to wait for all this fast-booting goodness to
arrive in its own good time.
Instrumenting the boot process
Your first step should be to measure how the time is currently being
spent while your machine boots. Then optimise the slow bits, and don't
worry about the bits that are already fast.
A couple of tools are available for measuring the time taken during
boot and visualising the results. I suggest that you install these
tools first and save their results somewhere safe: I have not done so,
and so I can no longer show you how slowly my machine booted before I
started fixing it, which is a shame The total time was, IIRC, 33
seconds from the end of Grub to the xdm login dialog being visible;
I've knocked 19 seconds off that.
bootchart
bootchart is available as a Debian package. Install it and boot with
"init=/sbin/bootchartd" added to the kernel command line. (In Grub,
select the kernel using the cursor keys, press e, select the line with
the kernel command line, press e, edit, press return, and then press
b.) Then run the bootchart utility which reads the log written during
boot and creates an SVG graph. You can view the resulting file using
most web browsers, or you can try "see" which will probably launch inkscape.
The bootchart will show you which processes took the most time, and you
can also see how much time was spent waiting for I/O and how much time
was CPU-limited. If the results don't seem to make much sense, try
running bootchart with its -n option; this makes the results more verbose.
bootgraph
This similarly-named utility plots a graph showing how the kernel spent
its time during initialisation, i.e. the blank period at the beginning
of the bootchart. The script is included in the scripts/ directory of
the kernel source, but I believe it is only in Linus' tree since
2.6.28-rc1. If you have an earlier kernel you can probably download the
script alone; there is one kernel patch (to init/main.c) but I don't
think it's vital unless you're also using asynchronous init calls, as
described below.
To use bootgraph, boot with "initcall_debug" added to the kernel
command line and then run "dmesg|perl scripts/bootgraph.pl > bootgraph.svg".
Fix the really obvious things
Before spending time on the hard stuff, fix these easy and obvious things:
* Minimise the time that Grub waits before booting its default
kernel by adjusting the timeout paramter in /boot/grub/menu.lst. I
believe that the Debian default is 5 seconds.
* Remove anything that takes time at boot that you're not using.
(Personally I find it's easier to not install such things in the first place...)
* If you're using a cpufreq governor, make sure that boot runs at
full speed. (I load the powersave governor mainly because it makes it
unlikely that the fan will ever come on - I don't like fans. However,
when booting from cold it's unlikely that the fan will be needed even
at full speed. So I load the cpufreq governor at S99.)
Now on to the more complex stuff.
Building a fast-booting kernel
There are a number of things that you can do to the kernel to make it
boot faster:
* You can eliminate the initrd or initramfs. These features make it
possible for Debian to ship a kernel that will boot on a lot of
different hardware without the bloat of building-in drivers for
everyone's root disks. But it results in slower boot. If you build in
the essential drivers for your root filesystem an initrd is not needed.
* You can build in drivers for all of your hardware, rather than
having udev load modules for them afterwards. Again this conflicts with
a distribution's desire to provide a kernel package that works with all
hardware, but by avoiding all the work that udev does loading modules
this can make boot faster.
* There are a few patches that reduce unnecessary delays during
boot, described below.
Configuring a kernel with built-in drivers
I have been thinking about how a distribution like Debian could make it
easier for users to create custom kernels that build in all of the
drivers needed for their hardware. What I've come up with is the following:
* The user boots a conventional Debian all-modular kernel, checking
first that they don't have any extraneous USB devices or similar
hardware attached.
* The conventional udev startup will load all of the modules needed
to drive their hardware.
* lsmod will report which modules were loaded By some means we map
from the module names to the kernel config settings that enable them,
and change them from "m" to "y" so that they will be built in.
* They then build and install a kernel with this new config.
The hard bit is the third step above. Luckily I found a script by
Steven Rostedt that did almost what was needed - it did the hard part
of mapping from module names to config settings - and I adapted it to
buildin_used_mods.pl (local copy). Run this at the root of your kernel
tree; it will write the new .config to stdout.
This script seems to do a good job, but it's not perfect. The
particular problem that I found was that although it determines the
correct config setting for the IDE hardware and sets it to "y", it
doesn't know that it must also set the higher-level setting CONFIG_IDE
to "y". Furthermore, when you "make menuconfig" it will detect this
inconsistency and fix it in the wrong way by changing the IDE driver
back to a module. The solution to this is to "make menuconfig" before
running the script and to change CONFIG_IDE to "y". There may be other
such problems; is there a way to automatically resolve them correctly?
A further useful but non-essential step, since it makes the kernel
build more quickly, would be to disable all of those modules that are
for internal hardware that we don't have, so that we only build modular
drivers for things like USB devices.
So, could we have a Debian kernel package that did all of that automagically?
Kernel patches for faster booting
I have applied the following patches to improve boot time:
* This patch, which I believe is in 2.6.28-rc1, eliminates some
unnecessary locking in the driver-to-device matching code. Believe it
or not, without this patch the pc speaker driver will wait until the
mouse has been initialised (which may take several seconds) in order to
check whether it is actually a speaker. Now, it still does the check
but it doesn't take the lock before doing so. Of course it's not only
that particular pair of devices but rather every pair of devices on
every bus; it just happened to be that pair that wasted the most time
in my bootgraph.
* The Eee 901 uses PCI Express hotplug (pciehp) to toggle the Wifi
power. This driver had a number of 1-second pauses which slow boot and
also suspend/resume; all of them have now been eliminated for this
hardware thanks to a couple of patches, this one which has made it into
Linus' tree and I believe 2.6.28-rc2, and this one which hasn't.
* One of Arjan's main innovations to achieve his fast boot time was
to introduce more concurrency during the kernel startup: specifically,
some drivers that are not on the critical path to getting the root
filesystem mounted are initialised on an asynchronous thread. In
particular, USB seems to take a while to initialise, as does the Eee's
ACPI battery monitor. This work can be found in its own git tree. I'm
not sure when we can expect to see this merged; for example, someone
will have to decide which drivers should be on the async thread and
which not, and the answer might be "it depends" in a lot of cases.
Anyway, Arjan's choices are good for the Eee 901 and I have saved a bit
of time by using it.
Eliminating coldplugging
In most modern Linux systems, whether or not they have modular kernels,
soon after the kernel has booted the udev daemon performs
"coldplugging". This enumerates all of the devices present at boot time
and loads kernel modules, creates /dev entries, and does anything else
necessary to get the device working. It's called coldplugging because
these are the same operations that are done for hotplugged devices,
except that they're not in response to hotplugging events.
Looking at bootcharts it's clear that this takes quite some time.
Building all of the drivers in to the kernel, rather than having
modules, makes some difference but that is not where all the time goes:
even when the drivers are built in, the udev daemon will still run
modprobe which wastes some time before realising that it's a no-op.
It may be possible to speed this up by making the udev system smarter
in some way. But I've followed Arjan's approach and used a
pre-populated /dev. For this to work, you need to be sure that:
* The only action that udev would do for the devices is to create
/dev entries. Often udev would load modules, but we don't have to worry
about that as everything is built in. In principle, udev rules can
carry out arbitrary actions though this is rare.
* The device major/minor numbers aren't going to change from one
boot to the next. I'm unclear about this and would welcome advice! For
example, if the order in which disks appear is non-deterministic (as it
is with USB devices) then this is broken.
I've also been told that HAL relies on udev and that X version 1.5
relies on HAL; since I use neither of these I don't know the whole
story and it may be that the touchpad is the only affected device. Can
anyone shed any light on this?
It's important to note that pre-populating /dev and not doing
coldplugging does not mean that you have to give up hotplugging. The
approach that I describe here still starts the udev daemon to handle
hotplugged devices, and also removeable devices that are attached at boot.
It is relatively simple to use a fixed /dev on a "locked down" system,
but it's more of a challenge to do it on a system like Debian which can
run on different hardware. I have therefore used the following method:
* Initially the system is booted with an unmodified udev system
which does conventional coldplugging to populate /dev.
* Immediately that coldplugging is finished, tar is used to record
the contents of /dev.
* On subsequent boots, the tar file is detected and coldplugging is
not done but instead the tar file is extracted to create the contents
of /dev. udevd is still used to handle hotplugging and coldplugging of
removeable devices.
* If at any time it's necessary to update the contents of /dev,
perhaps because new hardware has been added to a desktop machine or if
a new kernel has been installed, the tar file can be removed and the
process is repeated.
I've implemented this by modifying the standard Debian /etc/initd/udev
script; my modified version can be downloaded here (local copy). As
you'll see if you diff that against your regular script, my changes are
quite limited in scope and more than a bit hacky. No doubt the
implementation could be improved, but first we need to decide whether
this is the right strategy.
Disk read-ahead
Bootchart shows that the system spends quite a lot of its time at well
below 100% CPU utilisation, waiting for the disk. A technique that
Arjan and Auke used to alleviate this is read-ahead, i.e. to prefetch
from the disk those files (or parts of files?) that it's known will be
needed later in the boot. Debian already packages another readahead
program, but Arjan and Auke have invented Super ReadAhead. I'm not
aware of how it differs and it seems to lack documentation; however, I
was able to get it to work by following the instructions posted on the
download page by John Lamb.
The improvement resulting from read-ahead is worth having, but is not
spectacular. It's a technique that's worth applying as well as
everything else described here, but by itself I think you're unlikely
to notice the improvement unless you use a stopwatch.
Setting the clock
Setting the clock, i.e. reading the hardware battery-backed clock into
the kernel, seemed to be taking an inordinate amount of time. There
turned out to be about 3 factors involved in this:
* Debian sets the clock twice, via the hwclock.sh and
hwclockfirst.sh init scripts. I'm still unsure why this is; see Debian
bug 327584. I've removed one of the scripts and nothing seems to have broken.
* On some systems, including the Eee until a recent kernel fix,
hwclock's --directisa option was used. This option causes hwclock to
use more CPU, so you should not enable it unless you believe that your
combination of hardware and kernel needs it.
* Most seriously, hwclock waits until the seconds in the hardware
clock tick over; this will take on average half a second, except that
in the case where hwclock is run twice (see above) the second
invocation will take nearer a second. Fix this and the other problems
don't matter any more.
The underlying issue with the last point is that the hardware doesn't
tell us fractional seconds. So if we want our clock to be accurate we
need to wait for the hardware to tick over. But do we actually need our
clock to be that accurate? (And if we later run ntp, the inaccuracy
will only be temporary.) If you're happy with your clock being wrong by
up to plus or minus half a second, this patch that I knocked together
adds a --notickwait option to hwclock. This makes hwclock almost instantaneous.
An alternative might be to run hwclock in parallel with other
initialisation. The problem with this is that it can't start until
/dev/rtc has been created and it needs to be done by the time fsck
runs, and this is a fairly small window.
NFS
If you don't run NFS you can ignore this section - though you might
like to double-check that you don't have any unused NFS packages
installed that are slowing down your boot.
In my case, I use NFS with autofs on my Eee to access filesystems on
other local machines. But this is something that I use only rarely, and
certainly only when I'm at home. It turns out that there's a
significant boot delay that can be avoided unless NFS was in use when
the machine was last shut down.
The process to look out for is sm-notify, and it took up a big chunk of
my bootchart with a very large associated peak in disk activity. It
seems that the purpose of sm-notify is to send a message to those NFS
servers that the machine was using before shutdown to tell them that it
is now back up. But before starting to send these messages, it does
something which has the side-effect of invoking sync() and causing all
pending writes to be flushed out to disk. That takes ages.
This is especially wasteful in the case where you didn't use NFS at all
during the last session, so there are no servers to notify. For me this
is the common case. So I have written this patch against nfs-utils
version 1.13 which detects the case when there are no servers to
communicate with and terminates early, before the sync(). This patch
has now been applied upstream and is included in nfs-utils 1.14 -
however, there is a some doubt about whether it is really safe in all
cases. You might want to review this thread to see if this has been resolved.
Starting X sooner
X takes a long time to start. At some point there should be a
significant improvement to this when "kernel modesetting" is introduced
- perhaps in 2.629. If you're keen you could try to use this now -
you'll need kernel patches and a new X server - but I'm going to wait.
Some of the X startup time can be hidden by running it in parallel with
other activity. At present, Debian starts xdm as the very last thing
(at S99). gdm starts earlier at S30, but that's still quite late in the
boot process. I now start X at S04.
Quite how early you're prepared to start it depends on what other
services X depends on. In particular, does X need that the network is
up? In some cases it makes sense to wait; an example would be when home
directories are on NFS. However even in that case it would still be
possible to start xdm and let the user type their username and
password; if necessary it could wait for the network at that point. On
a laptop, however, it's very unlikely that X (or anything much) will
depend on the network being up. Perhaps something in the X packages
could automatically detect or ask the user about these dependencies and
start X at the earliest safe opportunity.
Note that if you start X early you may not want to shut it down late.
Typically, startup and shutdown scripts are symetrical but you might
want to make an exception in this case. The example that was pointed
out to me was taking away networked filesystems before the programs
that are using them have terminated. I've left xdm at K99.
Starting networking later
As noted above, on a laptop in particular it's unlikely that very much
depends on networking being up during boot. And startng networking can
be slow, especially if DHCP is involved. So I postpone starting the
network until late in the boot where it will run in parallel with X
starting up.
There are a couple of subtleties:
* The driver for the Eee 901's wifi is an out-of-tree module that
can't be built in to the kernel.
* Network devices are a case where udevd does do more than just
load modules and create /dev nodes.
I have therefore adopted the following scheme:
* During initial coldplugging I skip network devices. When I'm
using the pre-populated /dev I skip them anyway because I only coldplug
USB devices, but when I'm not using the pre-populated /dev for some
reason I still skip network devices. I have to match the wifi device by
its PCI id since at that point the kernel hasn't recognised that it is
a network device. This is in my modified udev script linked above
* I have a coldplug_networking script that runs at S09, i.e. after
xdm. This coldplugs the wifi device and the other network devices
Conclusions and future work
Using the methods described above, the boot time for my Eee 901 from
the end of Grub to the xdm login dialog being visible has been reduced
from about 33 seconds to about 14 seconds. Here are the bootgraph and
bootchart for the system as it is now. Perhaps also of interest to Eee
901 users is my kernel config (local copies).
The "sore thumb" that still stands out in those 14 seconds is the
startup time for X. (However, it doesn't stand out in the bootchart as
that stops when the rc scripts have finished, which is several seconds
before the login dialog appears.) But there is hope there, and I'm
happy to wait for a few months and see how the kernel modesetting stuff
pans out.
In addition to those 14 seconds, there's also the time taken by the
BIOS before Grub runs; that seems to vary a bit, maybe 4 seconds when
rebooting up to 10 seconds when powering on. It would be great to
reduce that; maybe Intel are secretly working on this, or if not
perhaps we could use Coreboot (AKA LinuxBIOS). I note that CoreBoot has
recently announced support for some of the chips in the '901. This
isn't something I'm planning to work on myself, but if someone would
like to post a recipe for how to put CoreBoot on an Eee without
bricking it, I'd love to see it!
I hope that this article inspires some other users to see what can be
done on their own machines. Also, I hope that the Debian developers
responsible for some of the affected packages can think about what they
can do. So, over to you...
Reply to: