Re: "Booting Debian in 14 seconds"

To: "Bastian Blank" <waldi@debian.org>
Cc: <debian-kernel@lists.debian.org>
Subject: Re: "Booting Debian in 14 seconds"
From: "Phil Endecott" <phil_blosz_endecott@chezphil.org>
Date: Fri, 14 Nov 2008 16:24:22 +0000
Message-id: <[🔎] 1226679862351@dmwebmail.dmwebmail.chezphil.org>
In-reply-to: <[🔎] 20081114160619.GD23890@wavehammer.waldi.eu.org>
References: <[🔎] 20081114160619.GD23890@wavehammer.waldi.eu.org>

Bastian Blank wrote:

On Thu, Nov 13, 2008 at 09:01:07PM +0000, Phil Endecott wrote:
I'm writing to let you know about an article that I've written forDebian Administration about improving boot time:
Please quote the complete text here if you want to discuss it.


Booting Debian in 14 seconds

Posted by endecotp on Mon 10 Nov 2008 at 22:42

Many readers will have heard about Arjan van de Ven and Auke Kok's workto boot an ASUS Eee 901 in 5 seconds. Inspired by this work, andbecause I have the same laptop, I decided to try to reproduce theirresults. So far I have not come very close to their 5 seconds, but Ihave made some significant improvements compared to the default boottime for Debian on that machine; this article describes what I've done.

Although some of what follows is specific to the Eee 901, most of itisn't and could be applied to other laptops and PCs in general.

This article assumes that you're already familiar with things likebuilding kernels, applying patches and so on. The target audience isthe "advanced end user", and also the Debian developers responsible forthe packages concerned who I hope will be motivated to incorporate someof this work.

It's worth noting that many of the things that are described here arealready making their way into the upstream sources, so the lazy readermight decide simply to wait for all this fast-booting goodness toarrive in its own good time.


Instrumenting the boot process

Your first step should be to measure how the time is currently beingspent while your machine boots. Then optimise the slow bits, and don'tworry about the bits that are already fast.

A couple of tools are available for measuring the time taken duringboot and visualising the results. I suggest that you install thesetools first and save their results somewhere safe: I have not done so,and so I can no longer show you how slowly my machine booted before Istarted fixing it, which is a shame The total time was, IIRC, 33seconds from the end of Grub to the xdm login dialog being visible;I've knocked 19 seconds off that.


bootchart

bootchart is available as a Debian package. Install it and boot with"init=/sbin/bootchartd" added to the kernel command line. (In Grub,select the kernel using the cursor keys, press e, select the line withthe kernel command line, press e, edit, press return, and then pressb.) Then run the bootchart utility which reads the log written duringboot and creates an SVG graph. You can view the resulting file usingmost web browsers, or you can try "see" which will probably launch inkscape.

The bootchart will show you which processes took the most time, and youcan also see how much time was spent waiting for I/O and how much timewas CPU-limited. If the results don't seem to make much sense, tryrunning bootchart with its -n option; this makes the results more verbose.


bootgraph

This similarly-named utility plots a graph showing how the kernel spentits time during initialisation, i.e. the blank period at the beginningof the bootchart. The script is included in the scripts/ directory ofthe kernel source, but I believe it is only in Linus' tree since2.6.28-rc1. If you have an earlier kernel you can probably download thescript alone; there is one kernel patch (to init/main.c) but I don'tthink it's vital unless you're also using asynchronous init calls, asdescribed below.

To use bootgraph, boot with "initcall_debug" added to the kernelcommand line and then run "dmesg|perl scripts/bootgraph.pl > bootgraph.svg".


Fix the really obvious things

Before spending time on the hard stuff, fix these easy and obvious things:

* Minimise the time that Grub waits before booting its defaultkernel by adjusting the timeout paramter in /boot/grub/menu.lst. Ibelieve that the Debian default is 5 seconds.* Remove anything that takes time at boot that you're not using.(Personally I find it's easier to not install such things in the first place...)* If you're using a cpufreq governor, make sure that boot runs atfull speed. (I load the powersave governor mainly because it makes itunlikely that the fan will ever come on - I don't like fans. However,when booting from cold it's unlikely that the fan will be needed evenat full speed. So I load the cpufreq governor at S99.)


Now on to the more complex stuff.

Building a fast-booting kernel

There are a number of things that you can do to the kernel to make itboot faster:

* You can eliminate the initrd or initramfs. These features make itpossible for Debian to ship a kernel that will boot on a lot ofdifferent hardware without the bloat of building-in drivers foreveryone's root disks. But it results in slower boot. If you build inthe essential drivers for your root filesystem an initrd is not needed.* You can build in drivers for all of your hardware, rather thanhaving udev load modules for them afterwards. Again this conflicts witha distribution's desire to provide a kernel package that works with allhardware, but by avoiding all the work that udev does loading modulesthis can make boot faster.* There are a few patches that reduce unnecessary delays duringboot, described below.


Configuring a kernel with built-in drivers

I have been thinking about how a distribution like Debian could make iteasier for users to create custom kernels that build in all of thedrivers needed for their hardware. What I've come up with is the following:

* The user boots a conventional Debian all-modular kernel, checkingfirst that they don't have any extraneous USB devices or similarhardware attached.* The conventional udev startup will load all of the modules neededto drive their hardware.* lsmod will report which modules were loaded By some means we mapfrom the module names to the kernel config settings that enable them,and change them from "m" to "y" so that they will be built in.

    * They then build and install a kernel with this new config.

The hard bit is the third step above. Luckily I found a script bySteven Rostedt that did almost what was needed - it did the hard partof mapping from module names to config settings - and I adapted it tobuildin_used_mods.pl (local copy). Run this at the root of your kerneltree; it will write the new .config to stdout.

This script seems to do a good job, but it's not perfect. Theparticular problem that I found was that although it determines thecorrect config setting for the IDE hardware and sets it to "y", itdoesn't know that it must also set the higher-level setting CONFIG_IDEto "y". Furthermore, when you "make menuconfig" it will detect thisinconsistency and fix it in the wrong way by changing the IDE driverback to a module. The solution to this is to "make menuconfig" beforerunning the script and to change CONFIG_IDE to "y". There may be othersuch problems; is there a way to automatically resolve them correctly?

A further useful but non-essential step, since it makes the kernelbuild more quickly, would be to disable all of those modules that arefor internal hardware that we don't have, so that we only build modulardrivers for things like USB devices.


So, could we have a Debian kernel package that did all of that automagically?

Kernel patches for faster booting

I have applied the following patches to improve boot time:

* This patch, which I believe is in 2.6.28-rc1, eliminates someunnecessary locking in the driver-to-device matching code. Believe itor not, without this patch the pc speaker driver will wait until themouse has been initialised (which may take several seconds) in order tocheck whether it is actually a speaker. Now, it still does the checkbut it doesn't take the lock before doing so. Of course it's not onlythat particular pair of devices but rather every pair of devices onevery bus; it just happened to be that pair that wasted the most timein my bootgraph.* The Eee 901 uses PCI Express hotplug (pciehp) to toggle the Wifipower. This driver had a number of 1-second pauses which slow boot andalso suspend/resume; all of them have now been eliminated for thishardware thanks to a couple of patches, this one which has made it intoLinus' tree and I believe 2.6.28-rc2, and this one which hasn't.* One of Arjan's main innovations to achieve his fast boot time wasto introduce more concurrency during the kernel startup: specifically,some drivers that are not on the critical path to getting the rootfilesystem mounted are initialised on an asynchronous thread. Inparticular, USB seems to take a while to initialise, as does the Eee'sACPI battery monitor. This work can be found in its own git tree. I'mnot sure when we can expect to see this merged; for example, someonewill have to decide which drivers should be on the async thread andwhich not, and the answer might be "it depends" in a lot of cases.Anyway, Arjan's choices are good for the Eee 901 and I have saved a bitof time by using it.


Eliminating coldplugging

In most modern Linux systems, whether or not they have modular kernels,soon after the kernel has booted the udev daemon performs"coldplugging". This enumerates all of the devices present at boot timeand loads kernel modules, creates /dev entries, and does anything elsenecessary to get the device working. It's called coldplugging becausethese are the same operations that are done for hotplugged devices,except that they're not in response to hotplugging events.

Looking at bootcharts it's clear that this takes quite some time.Building all of the drivers in to the kernel, rather than havingmodules, makes some difference but that is not where all the time goes:even when the drivers are built in, the udev daemon will still runmodprobe which wastes some time before realising that it's a no-op.

It may be possible to speed this up by making the udev system smarterin some way. But I've followed Arjan's approach and used apre-populated /dev. For this to work, you need to be sure that:

* The only action that udev would do for the devices is to create/dev entries. Often udev would load modules, but we don't have to worryabout that as everything is built in. In principle, udev rules cancarry out arbitrary actions though this is rare.* The device major/minor numbers aren't going to change from oneboot to the next. I'm unclear about this and would welcome advice! Forexample, if the order in which disks appear is non-deterministic (as itis with USB devices) then this is broken.

I've also been told that HAL relies on udev and that X version 1.5relies on HAL; since I use neither of these I don't know the wholestory and it may be that the touchpad is the only affected device. Cananyone shed any light on this?

It's important to note that pre-populating /dev and not doingcoldplugging does not mean that you have to give up hotplugging. Theapproach that I describe here still starts the udev daemon to handlehotplugged devices, and also removeable devices that are attached at boot.

It is relatively simple to use a fixed /dev on a "locked down" system,but it's more of a challenge to do it on a system like Debian which canrun on different hardware. I have therefore used the following method:

* Initially the system is booted with an unmodified udev systemwhich does conventional coldplugging to populate /dev.* Immediately that coldplugging is finished, tar is used to recordthe contents of /dev.* On subsequent boots, the tar file is detected and coldplugging isnot done but instead the tar file is extracted to create the contentsof /dev. udevd is still used to handle hotplugging and coldplugging ofremoveable devices.* If at any time it's necessary to update the contents of /dev,perhaps because new hardware has been added to a desktop machine or ifa new kernel has been installed, the tar file can be removed and theprocess is repeated.

I've implemented this by modifying the standard Debian /etc/initd/udevscript; my modified version can be downloaded here (local copy). Asyou'll see if you diff that against your regular script, my changes arequite limited in scope and more than a bit hacky. No doubt theimplementation could be improved, but first we need to decide whetherthis is the right strategy.

Disk read-ahead

Bootchart shows that the system spends quite a lot of its time at wellbelow 100% CPU utilisation, waiting for the disk. A technique thatArjan and Auke used to alleviate this is read-ahead, i.e. to prefetchfrom the disk those files (or parts of files?) that it's known will beneeded later in the boot. Debian already packages another readaheadprogram, but Arjan and Auke have invented Super ReadAhead. I'm notaware of how it differs and it seems to lack documentation; however, Iwas able to get it to work by following the instructions posted on thedownload page by John Lamb.

The improvement resulting from read-ahead is worth having, but is notspectacular. It's a technique that's worth applying as well aseverything else described here, but by itself I think you're unlikelyto notice the improvement unless you use a stopwatch.


Setting the clock

Setting the clock, i.e. reading the hardware battery-backed clock intothe kernel, seemed to be taking an inordinate amount of time. Thereturned out to be about 3 factors involved in this:

* Debian sets the clock twice, via the hwclock.sh andhwclockfirst.sh init scripts. I'm still unsure why this is; see Debianbug 327584. I've removed one of the scripts and nothing seems to have broken.* On some systems, including the Eee until a recent kernel fix,hwclock's --directisa option was used. This option causes hwclock touse more CPU, so you should not enable it unless you believe that yourcombination of hardware and kernel needs it.* Most seriously, hwclock waits until the seconds in the hardwareclock tick over; this will take on average half a second, except thatin the case where hwclock is run twice (see above) the secondinvocation will take nearer a second. Fix this and the other problemsdon't matter any more.

The underlying issue with the last point is that the hardware doesn'ttell us fractional seconds. So if we want our clock to be accurate weneed to wait for the hardware to tick over. But do we actually need ourclock to be that accurate? (And if we later run ntp, the inaccuracywill only be temporary.) If you're happy with your clock being wrong byup to plus or minus half a second, this patch that I knocked togetheradds a --notickwait option to hwclock. This makes hwclock almost instantaneous.

An alternative might be to run hwclock in parallel with otherinitialisation. The problem with this is that it can't start until/dev/rtc has been created and it needs to be done by the time fsckruns, and this is a fairly small window.

NFS

If you don't run NFS you can ignore this section - though you mightlike to double-check that you don't have any unused NFS packagesinstalled that are slowing down your boot.

In my case, I use NFS with autofs on my Eee to access filesystems onother local machines. But this is something that I use only rarely, andcertainly only when I'm at home. It turns out that there's asignificant boot delay that can be avoided unless NFS was in use whenthe machine was last shut down.

The process to look out for is sm-notify, and it took up a big chunk ofmy bootchart with a very large associated peak in disk activity. Itseems that the purpose of sm-notify is to send a message to those NFSservers that the machine was using before shutdown to tell them that itis now back up. But before starting to send these messages, it doessomething which has the side-effect of invoking sync() and causing allpending writes to be flushed out to disk. That takes ages.

This is especially wasteful in the case where you didn't use NFS at allduring the last session, so there are no servers to notify. For me thisis the common case. So I have written this patch against nfs-utilsversion 1.13 which detects the case when there are no servers tocommunicate with and terminates early, before the sync(). This patchhas now been applied upstream and is included in nfs-utils 1.14 -however, there is a some doubt about whether it is really safe in allcases. You might want to review this thread to see if this has been resolved.

Starting X sooner

X takes a long time to start. At some point there should be asignificant improvement to this when "kernel modesetting" is introduced- perhaps in 2.629. If you're keen you could try to use this now -you'll need kernel patches and a new X server - but I'm going to wait.

Some of the X startup time can be hidden by running it in parallel withother activity. At present, Debian starts xdm as the very last thing(at S99). gdm starts earlier at S30, but that's still quite late in theboot process. I now start X at S04.

Quite how early you're prepared to start it depends on what otherservices X depends on. In particular, does X need that the network isup? In some cases it makes sense to wait; an example would be when homedirectories are on NFS. However even in that case it would still bepossible to start xdm and let the user type their username andpassword; if necessary it could wait for the network at that point. Ona laptop, however, it's very unlikely that X (or anything much) willdepend on the network being up. Perhaps something in the X packagescould automatically detect or ask the user about these dependencies andstart X at the earliest safe opportunity.

Note that if you start X early you may not want to shut it down late.Typically, startup and shutdown scripts are symetrical but you mightwant to make an exception in this case. The example that was pointedout to me was taking away networked filesystems before the programsthat are using them have terminated. I've left xdm at K99.


Starting networking later

As noted above, on a laptop in particular it's unlikely that very muchdepends on networking being up during boot. And startng networking canbe slow, especially if DHCP is involved. So I postpone starting thenetwork until late in the boot where it will run in parallel with Xstarting up.


There are a couple of subtleties:

* The driver for the Eee 901's wifi is an out-of-tree module thatcan't be built in to the kernel.* Network devices are a case where udevd does do more than justload modules and create /dev nodes.


I have therefore adopted the following scheme:

* During initial coldplugging I skip network devices. When I'musing the pre-populated /dev I skip them anyway because I only coldplugUSB devices, but when I'm not using the pre-populated /dev for somereason I still skip network devices. I have to match the wifi device byits PCI id since at that point the kernel hasn't recognised that it isa network device. This is in my modified udev script linked above* I have a coldplug_networking script that runs at S09, i.e. afterxdm. This coldplugs the wifi device and the other network devices


Conclusions and future work

Using the methods described above, the boot time for my Eee 901 fromthe end of Grub to the xdm login dialog being visible has been reducedfrom about 33 seconds to about 14 seconds. Here are the bootgraph andbootchart for the system as it is now. Perhaps also of interest to Eee901 users is my kernel config (local copies).

The "sore thumb" that still stands out in those 14 seconds is thestartup time for X. (However, it doesn't stand out in the bootchart asthat stops when the rc scripts have finished, which is several secondsbefore the login dialog appears.) But there is hope there, and I'mhappy to wait for a few months and see how the kernel modesetting stuffpans out.

In addition to those 14 seconds, there's also the time taken by theBIOS before Grub runs; that seems to vary a bit, maybe 4 seconds whenrebooting up to 10 seconds when powering on. It would be great toreduce that; maybe Intel are secretly working on this, or if notperhaps we could use Coreboot (AKA LinuxBIOS). I note that CoreBoot hasrecently announced support for some of the chips in the '901. Thisisn't something I'm planning to work on myself, but if someone wouldlike to post a recipe for how to put CoreBoot on an Eee withoutbricking it, I'd love to see it!

I hope that this article inspires some other users to see what can bedone on their own machines. Also, I hope that the Debian developersresponsible for some of the affected packages can think about what theycan do. So, over to you...

Reply to:

References:
- Re: "Booting Debian in 14 seconds"
  - From: Bastian Blank <waldi@debian.org>

Prev by Date: Bug#505609: closed by Bastian Blank <waldi@debian.org> (Re: Bug#505609: Unbootable after kernel upgrade: Lilo can't load kernel)
Next by Date: Bug#505613: Kernel unable to mount file system and to execute init
Previous by thread: Re: "Booting Debian in 14 seconds"
Next by thread: Bug#469693: marked as done (linux-image-2.6.24-1-686: Got better but still looses USB devs (mouse/keyb))
Index(es):
- Date
- Thread