[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Desktop upgrade strategy (was: What should be upgraded first: kernel or userland?)



On Thu, Mar 22, 2007 at 12:33:14PM +0100, Javier Fernández-Sanguino Peña wrote:
> On Wed, Mar 21, 2007 at 04:22:31PM -0700, Steve Langasek wrote:

> > > This is not essential as long as you don't try to reboot before a new 
> > > kernel has been installed.

> > My concern here is: what happens if an upgrade is interrupted in the middle,
> > due to such things as a power outage, hardware failure, two admins not
> > communicating with one another, or admin confusion about the state of the
> > upgrade?  Is the system recoverable at every point during the upgrade
> > process, without extraordinary measures?

> No, see the mail I sent previously, many things can go wrong after an upgrade
> and new kernel install:  LILO, udev and device reordering might make a system
> unbootable before (and even after) the kernel upgrade.

Right.  There are definitely going to be cases here where we can't ensure
the system will be usable after a reboot; it was my hope to limit these to
cases where the device config wasn't right after upgrade.

Should we have lilo upgrade listed explicitly as a step prior to the
dist-upgrade?  That would minimize the window for this particular problem if
it doesn't cause other package removals.

The other cases where we still have problems are:

- once udev is installed, hotplug is removed; reboot to a previous 2.4
  kernel may fail to bring up network devices correctly as a result.
- once udev is installed, reboot to a sarge-era 2.6 kernel will /likely/
  fail to bring up network devices correctly as a result, and may also fail
  to bring up filesystems other than the root filesystem.
- after installation of the new kernel, the boot could obviously still fail
  for any number of miscellaneous, system-specific reasons that we don't
  know about.

So that's not /too/ bad, I guess.  The cases we have addressed successfully
are:

- installation of the etch kernel forced removal of the sarge kernel,
  leaving no fallback in the case of boot problems with the new kernel
- installation of the etch udev failed when running a sarge 2.6 kernel
- installation of the etch kernel forced removal of the entire desktop task
  :)
- rebooting to the etch kernel caused network devices to be brought up under
  different names -- AFAIK this shouldn't be an issue now because on install
  or upgrade, udev tries to write out rules for persistent net device names.

Given the alternatives, yeah, I can live with what we've got at this point.
Do you think we should document each of the possible problem cases, or is it
sufficient to provide recommendations on how to prepare for the need to
recover a system?

> > > And even then my tests have so far shown that the system will probably
> > > still boot (though X may not start).
> > 
> > And will the networking necessarily start?  That could be a problem for a
> > number of users if it doesn't.

> As far as I've seen, networking issues in upgrades have been related to
> device reordering and to #403706 (but this seems to affect new installs, not
> upgrades)

403706 is not specific to new installs; the root problem -- that devices are
brought up asynchronously under udev, and there's no guarantee that the main
network device is available before /etc/rcS.d/S39ifupdown runs, causing
other network services to be started before the interface has been fully
configured -- affects all systems which use udev for module loading.  But
for systems with PCI cards, the chances of this happening seem to be pretty
small.

> > I'm glad that interactive aptitude will provide a way around this (it almost
> > always will), but I don't think that alone is an adequate recommendation for
> > the release notes.

> As I've said, I think it would be a good "second" option ("expert" mode?)

Well -- do we want users using the "expert" mode if they aren't already
familiar with it?  And if not, is there any reason to document it in the
release notes?

> > And apt-get has different bugs (#410695), doesn't honor recommends, and
> > hasn't been what we've been recommending users use for upgrade testing for
> > the past months...

> Maybe not honoring recommends is precisely why it is working better than
> aptitude (see #411280 and #401317). In #401317 Osamu makes some tests
> with/without recommends that I would like to reproduce in a more recent
> "etch" (as the tests were done in December)

Hmm, I hadn't seen 411280 before.  I wonder if the current method discussed
in http://wiki.debian.org/Sarge2EtchUpgradeBlackboard works better here?

> > We can't flip-flop the recommended upgrade procedure every time we find a
> > bug affecting one tool but not another.  What do you think about the fam NMU
> > we discussed?

> If the only issue is related to the libfam dependency hell then removing the
> desktop, is there any way to fix that by maybe removing libfam0c102 prior to
> upgrade?

Not without removing large chunks of the desktop task.

Cheers,
-- 
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
vorlon@debian.org                                   http://www.debian.org/



Reply to: