[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Aptitude/Grub Problem -- Is this a bug?



On Friday 24 February 2006 13:24, Justin Guerin wrote:
> On Thursday 23 February 2006 18:23, Hal Vaughan wrote:
> > I posted earlier this week about some problems I had after doing:
> >
> > aptitude update && aptitude upgrade
> >
> > on a Sarge system.  It required rebooting and was immediately
> > unbootable -- ON SARGE!!!  This is the very stuff I am using stable
> > to avoid!
> >
> > I lost a day tracking it down and finally found that when a kernel
> > image is updated, update-grub is run.  Normally when apt/dpkg or
> > whatever part of apt actually upgrades a program and needs to
> > update a config file, it gives you a choice of updating or sticking
> > with the old file, or, at the very least, gives you a prompt and
> > warns you of the change.  However, when a kernel image is updated,
> > it does not do ANY of these things.  It doesn't warn you to back up
> > the /boot/grub/menu.lst file, it doesn't back it up itself, and it
> > does not, in any way, let you know it is doing this.
>
> You aren't given a choice of keeping your old grub config file,
> because without an update, you can't boot the new kernel.  Well, OK,
> you can, if you manually create the entry at the grub prompt, but you
> know what I mean.

In this case, it's the same kernel image (again, I'm only upgrading 
Sarge for security and bug fixes), so menu.lst did not need to be 
changed to load a patched version of the same kernel version.

> You aren't warned about update-grub removing an entry for a kernel,
> because this is only done when you remove a kernel.  If you've
> removed a kernel, but don't remove its grub entry, then you've got an
> entry that you can't use to boot.  You don't want that.

It didn't just remove an entry.  Update-grub completely overwrites the 
file so any entries for kernels on other partitions are gone.

Picture this: you have 5 partitions, each with a different OS or 
different Linux distros and different kernel versions on them.  One 
partition is your production partition, the one that HAS to always 
work, so you use Sarge for it because upgrades/updates in Sarge are not 
supposed to mess anything up.  Do an "aptitude update && aptitude 
upgrade" on your Sarge partition and, at least on my recent one, 
aptitude finds a updated version of the kernel image you're using, so 
it downloads and installs it.  Now, since it's Sarge, so you're not 
adding anything in an upgrade, and it is only replacing the same kernel 
image.  That means the same entry in menu.lst will work for the 
replacement kernel (same is true if only modules are upgraded).

Menu.lst is replaced anyway, which wipes out the entries for kernels and 
OSes on the other 4 partitions and any custom options for that 
particular kernel as well as custom options for any other kernels on 
that partition.

Since this happened, I found that it is possible, in menu.lst, to 
specify the default kernel options that are used and a few other 
features so update-grub will use the config options I need when it 
updates menu.lst, so (I think) I am protected on that for now.

The issue is that one has to FIND the additional options to fix the 
situation and prevent a change that keeps your system from booting.  
There is nothing, anywhere, to alert a sys admin that this will happen 
and must be taken into account.

> > I know some users know every detail of their systems, but I can't
> > do that.  I have a business to run and I started using Debian
> > Stable because it is supposed to not mess with things when it
> > upgrades.  I could not find anything warning me of this.  It turns
> > out there is documentation in updategrub's man file that I have
> > since used to make sure the options I've put in the list of boot
> > kernels is kept, but through testing, I've seen updategrub will
> > wipe out all entries for other kernels not the current root
> > partition (and this happens whenever apt upgrades the kernel
> > image).
>
> I'm not sure of your exact situation, but my experience with
> update-grub is that it only creates or keeps entries for kernels it
> thinks are installed. 

That's what I've found -- and only kernels on the current partition.  It 
has little intelligence or ability to find kernels on other partitions 
or to even scan the current list of entries and copy them (even copy 
them commented out) into the new version.

> I don't know whether or not update-grub depends 
> on apt's database, or if it just searches for kernels, but the source
> would surely tell you.

It seems to just search for kernels on the current boot or system 
partition.

> > Considering that the intent of stable is to make it so reliable one
> > can upgrade and count on the system continuing to work well, I
> > cannot see how this lack of warning (and not making a backup) as
> > anything other than a serious bug.  It could be easily fixed by
> > prompting the user with a warning menu.lst is about to be
> > overwritten, so there's time to back it up.  Even better the
> > standard prompt for whether or not to overwrite a config file would
> > be nice, since it would let the user decide to update menu.lst or
> > not (or maybe back it up).
> >
> > Is this not a bug?  Was I just supposed to somehow know that out of
> > all the packages out there, this was a specific behavior in
> > upgrading the kernel?  It makes me wonder how many other exceptions
> > are out there that I don't know about that could crash my system
> > next time I upgrade.
> >
> > Do others feel a prompt would be appropriate in this case?  I'd
> > like to hear feedback before I submit it as a bug, since there may
> > be some good reasons for doing this, however, I cannot imagine a
> > single good reason for overwriting a file this important without at
> > least telling the user/admin that it is happening.
> >
> > Hal
>
> What kernel package updated?  If your kernel is installed because of
> a package like linux-image-2.6-686, then I might understand what
> happened here.  

kernel-image-2.6.8-2-686, which includes the full version number, which 
is, I *think* not the same as 2.6.

> That is a dependency package.  When you install that 
> package with aptitude, it pulls in the relevant kernel as a
> dependency, and marks it as being automatically installed to satisfy
> dependencies.  When that package updates, and points to a new kernel
> package, then aptitude removes the old kernel, since it was only
> installed to satisfy a dependency, and installs the new package.  In
> this case, your working kernel will be removed (along with it's grub
> entry), and the new kernel will be put in its place.  If something
> fails in this operation, you would get an unbootable system (if that
> was your only kernel).

I see that, but in this case, the specific version of the 2.6 kernel is 
specified.  From what I read, that is an actual package.

> The solution is to mark the kernel your using as manually installed,
> so that it is not removed when it is no longer needed by any other
> package.

Actually, at this point, I've fixed menu.lst so the options I need will 
be automatically included.  But I still cannot see how overwriting such 
an important file without backing it up or prompting can not be a bug.

...
> So what kernel were you using, via what package, and what kernel did
> you upgrade to, via what package, and did aptitude warn you it was
> removing the older kernel?  You don't mention this, but I'd be
> surprised if it did and you missed it.

It wasn't a kernel upgrade.  I'm not sure why aptitude actually calls it 
an upgrade.  It was "aptitude update && aptitude upgrade", which pulls 
down the latest packages, which, on Sarge, means only updating bug and 
security fixes.  So, while it is called an upgrade, it should not have 
actually upgraded to a new kernel version.  No kernel was removed.

Even if I were upgrading to a new kernel, it seems to me if I am not 
removing an earlier version, it should still keep that entry intact.

Thanks for the comments and for the ideas.  I still think this is a bug, 
under the category of oversight.  I can see the need to re-write the 
file, but that is a problem when it can just dump a lot of 
configuration settings for kernels on other partitions or special 
settings for other kernels.

Hal



Reply to: