[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Aptitude/Grub Problem -- Is this a bug?



On Friday 24 February 2006 16:56, Justin Guerin wrote:
> On Friday 24 February 2006 12:07, Hal Vaughan wrote:
> > On Friday 24 February 2006 13:24, Justin Guerin wrote:
> > > On Thursday 23 February 2006 18:23, Hal Vaughan wrote:
>
> [snip]
>
> > > You aren't given a choice of keeping your old grub config file,
> > > because without an update, you can't boot the new kernel.  Well,
> > > OK, you can, if you manually create the entry at the grub prompt,
> > > but you know what I mean.
> >
> > In this case, it's the same kernel image (again, I'm only upgrading
> > Sarge for security and bug fixes), so menu.lst did not need to be
> > changed to load a patched version of the same kernel version.
>
> You're right.  I wonder if that call is in there because LILO does
> need to be run in such a situation?

I just filed a bug report on it (#354243).  The guy seemed pretty 
anxious to close it and just dismissed it as a non-aptitude thing, 
saying I had set a kernel config file to do it, which I had not.  I 
think he was more interested in closing the bug then in dealing with 
the issue.  If anyone wants to include more on the bug report, please 
feel free.

> > > You aren't warned about update-grub removing an entry for a
> > > kernel, because this is only done when you remove a kernel.  If
> > > you've removed a kernel, but don't remove its grub entry, then
> > > you've got an entry that you can't use to boot.  You don't want
> > > that.
> >
> > It didn't just remove an entry.  Update-grub completely overwrites
> > the file so any entries for kernels on other partitions are gone.
> >
> > Picture this: you have 5 partitions, each with a different OS or
> > different Linux distros and different kernel versions on them.  One
> > partition is your production partition, the one that HAS to always
> > work, so you use Sarge for it because upgrades/updates in Sarge are
> > not supposed to mess anything up.  Do an "aptitude update &&
> > aptitude upgrade" on your Sarge partition and, at least on my
> > recent one, aptitude finds a updated version of the kernel image
> > you're using, so it downloads and installs it.  Now, since it's
> > Sarge, so you're not adding anything in an upgrade, and it is only
> > replacing the same kernel image.  That means the same entry in
> > menu.lst will work for the replacement kernel (same is true if only
> > modules are upgraded).
> >
> > Menu.lst is replaced anyway, which wipes out the entries for
> > kernels and OSes on the other 4 partitions and any custom options
> > for that particular kernel as well as custom options for any other
> > kernels on that partition.
>
> Now I understand your problem.  If those entries were outside of the
> ### BEGIN AUTOMAGIC KERNELS LIST
> ### END DEBIAN AUTOMAGIC KERNELS LIST
> then update-grub shouldn't have touched them, and you should file a
> bug.

Okay.  That's easy and makes sense.  But there is still a problem, but 
may be more with grub.  I went through the man pages of grub and did a 
lot of research to figure out how to make the changes I needed.  Not 
once did I see this documented.  So there is basically a default 
behavior to overwrite the file, but any documentation on how to prevent 
changes from being overwritten is obscure.  This one incident has 
really led me to question the overall stability of Stable and wonder 
when another muck-up like this will happen because all the 
documentation warning about such a default behavior is obscure.

> However, if the entries for the other OSes and kernels on other
> partitions wasn't outside of those, then update-grub assumes it's
> supposed to manage them.  Still, I can see how you have to know
> something in order to avoid that mistake, and I agree with you: if
> you have to know something about how the program operates, then there
> should be some sort of warning.  At the very least, it should tell
> you what it plans to do and give you an opportunity to back out.

Bingo!  Thanks for saying what I was trying to say.  It's like putting a 
button your VCR to rewind the tape, and not including in the 
directions, "Oh, yeah, by the way, this fast auto-rewind will blank the 
tape as it rewinds," and making the button look okay on the VCR, with 
no warnings, and somehow expecting everyone to have read that one 
paragraph in the docs or to expect unreasonable and unlikely behavior 
from that button.

> > Since this happened, I found that it is possible, in menu.lst, to
> > specify the default kernel options that are used and a few other
> > features so update-grub will use the config options I need when it
> > updates menu.lst, so (I think) I am protected on that for now.
> >
> > The issue is that one has to FIND the additional options to fix the
> > situation and prevent a change that keeps your system from booting.
> > There is nothing, anywhere, to alert a sys admin that this will
> > happen and must be taken into account.
>
> Yes, I agree.  Do you think a dialog box is best?  Or is a comment
> within the menu.lst file sufficient?  Whatever you think is the right
> solution should be put in the bug report.

I included that information, suggesting that if nothing else, or if it 
took time to fix, at least a prompt warning that this could happen and 
that it was time to back up that file would be appropriate.  
Personally, I think it should have the standard "About to overwrite 
{configfile}.  Should I?  y/N/q/???"  (I don't remember the wording and 
the 4-5 allowed choices).

> [snip]
>
> > > I'm not sure of your exact situation, but my experience with
> > > update-grub is that it only creates or keeps entries for kernels
> > > it thinks are installed.
> >
> > That's what I've found -- and only kernels on the current
> > partition.  It has little intelligence or ability to find kernels
> > on other partitions or to even scan the current list of entries and
> > copy them (even copy them commented out) into the new version.
>
> I believe update-grub should copy verbatim the config of kernels
> outside the automagic area.  If it's not, that's a bug.

That's also a "now that I know this, and how the heck would I have known 
it before it hosed my system" thing.  I checked and yes, it is in the 
comments about kernels in the automagic list being modified, but that 
still leaves the point of how was I supposed to know update-grub would 
be run by aptitude in the first place.

> I know grub doesn't scan other partitions for kernels.  I think that
> would be a wishlist item.  It's worth filing, though the developers
> of grub might think of that as being outside grub's scope.

And I know that is an issue for grub, not for Debian.  I don't see that 
it would be that hard to do, at least for the standard FLOSS kernels 
grub already deals with easily.

> [snip]
>
> > > What kernel package updated?  If your kernel is installed because
> > > of a package like linux-image-2.6-686, then I might understand
> > > what happened here.
> >
> > kernel-image-2.6.8-2-686, which includes the full version number,
> > which is, I *think* not the same as 2.6.
>
> Yes, it's a specific, full version number, not a metapackage.
>
> > ...
> >
> > > So what kernel were you using, via what package, and what kernel
> > > did you upgrade to, via what package, and did aptitude warn you
> > > it was removing the older kernel?  You don't mention this, but
> > > I'd be surprised if it did and you missed it.
> >
> > It wasn't a kernel upgrade.  I'm not sure why aptitude actually
> > calls it an upgrade.  It was "aptitude update && aptitude upgrade",
> > which pulls down the latest packages, which, on Sarge, means only
> > updating bug and security fixes.  So, while it is called an
> > upgrade, it should not have actually upgraded to a new kernel
> > version.  No kernel was removed.
>
> Part of the version string of the kernel was updated, therefore it
> was considered an upgrade.  Even though most of the files were
> upgraded in place, because the kernel modules' version numbers have
> to match exactly with the kernel's version numbers, every file has to
> be replaced, and it's considered an upgrade.  Technically, a kernel
> was removed.

Okay, I see that.  But (not aimed at you, but a general statement) when 
the "You have to reboot" message, which is 2 large paragraphs, comes 
up, they could at least say, "And since your kernel is being upgraded, 
then the following config files could also be effected, so you may want 
to back one or more up before they are changed."

Literally, how was I supposed to know that aptitude upgrade would do 
that?  I'm sure some smart-ass will say, "You could have read the 
docs," but my response is, "I have a life.  I use the computer as a 
tool.  It is MY tool, I am not its tool.  I do not have time to read 
upgrade docs on every package that gets upgraded and one of the 
positive points about apt is that messes like this aren't supposed to 
happen.  That's the kind of crap I expected to face back when I use an 
RPM based system."

Thanks, Justin, for the feedback.  I am likely going to bring this up on 
the appropriate list (isn't there a bugs list for Debian?) and make it 
clear I feel this is an important issue.  After all is said and done, I 
lost 2 days of productivity in my business because of this and the very 
reason I use Debian Stable is to avoid crap like that.  It's like 
Debian saying, "Apt is automatic and doesn't mess with your system," 
then the guy who answered my bug report saying, "Well, sorry, but it 
was your fault, so I'm not going to deal with it.  I don't care if you 
never touched that file.  You did and it's your fault and go away."

Hal



Reply to: