[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: installation boot fails with standard bootdisk on 486SX/33



On Aug 11,  7:25pm, Bruce Perens wrote:
> Subject: installation boot fails with standard bootdisk on 486SX/33
: From: "Christopher R. Hertel" <crh@nts.umn.edu>
: > On my system (brand-new AMD-486DX4-120), the error that I get tells
: > me that the failure is occurring as the kernel is being
: > decompressed.
:
: OK - if this is true that means:
:
: 1. Bad data on the floppy. Most common. Re-download and write
: another.

Have done.  Several times from several sources using several floppies.
I've also used several tools (including DOS format, Win95 format,
Norton Utilities Format, Scandisk, chkdsk, and other Norton stuff) to
verify that the floppies are in good condition before I write to them.
I am using dd under Linux 1.2.8 to write the floppies.  I am also using
the same floppy drive to format & write the disks, and to boot.

I think I've covered this aspect.

: 2. "LFB" setting in your BIOS wrong. See the installation document.
: Rare.

There is no LFB setting in my BIOS setup.  I have read the installation
documentation.

: 3. Bad RAM or other hardware. Happens _rarely_, but has indeed
: happened.
:
: > try turning off the cache and see
: > if that fixes the problem. If it does, report it as a bug.
:
: It is best reported to linux-kernel@vger.rutgers.edu and copied to
: us.
: It's not really our job to fix the kernel - we just distribute it.

I disabled the internal cache and--*poof*--the problem went away.
I will send a report to linux-kernel@vger.rutgers.edu.

Based on my results, and the results of others (as posted to this
mailing list) I believe that this is *not* a kernel problem because, as
you pointed out when discussing the possibility of an APM problem, the
kernel is not yet loaded when the decompression error occurs.  The
problem, it appears, is in the decompression code.

  *QUERY*: those of you who have had this problem and gotten around it
  by turning off the internal cache, did you turn the cache back on
  once you had installed the system on your hard disk?  Did that work?

I'd like to know whether the decompression code in the floppy is the
same as that which gets loaded on the hard drive.  If so, does the
problem persist when the kernel is being decompressed from the hard
drive or does it only appear when booting from floppy?  Is the kernel
decompressed directly from the floppy, or is it transferred to the RAM
disk first?

: Sigh. I wish you'd spend some time supporting new users booting their
: systems for a while. We really do need the help. It might change your
: opinions, too.

It seems that you've overlooked a couple of points:

 1) I *was* providing support for a new user attempting to boot his
    system.  I explained, as you did, that APM probably wasn't the
    problem.  I also suggested turning off the cache.  In my own case,
    and in several others, this (unfortunately) worked.  (I say
    "unfortunately", because I believe that I should be able to use the
    internal cache, unless you're telling me that Linux is intended
    *not* to run with internal cache enabled.)

 2) I spent a *year* trying to figure out the SIGVEC problem, which was
    reported by several Linux users via newsgroups and mailing lists.
    The problem appeared on a variety of motherboards using a variety
    of CPU types, controller types, and memory configurations.  Those
    of us who experienced this problem tried to combine our resources
    to solve it, but we did not have the technical expertise, or the
    support, or the stable platform we needed in order to accomplish
    much.  Think about it: how am I supposed to test a fix to the
    kernel if I can't compile a new kernel because the system crashes
    whenever I try?  (The SIGVEC problem was random, but typically
    occurred when an attempt was made to compile anything large.  No,
    it did *not* always appear at the same point in the compilation.)

    At first, I got a great deal of helpful advice regarding the SIGVEC
    problem.  Unfortunately, all of the suggestions that I received
    failed to correct it.  Eventually, people started telling me and
    the others that the problem was in our hardware.  When I explained
    that other operating systems (including DOS, Taos, and older
    versions of Linux) worked fine, I was told that those systems did
    not "exercise" the hardware as much as Linux does.  Great.

: > Now I'm being told that I can't install Debian with the 2.0.x
: > kernel because my hardware is incompatible?  This just doesn't make
: > sense!
:
: Huh? What hardware? Who said it was incompatible?

    So I've finally saved enough money to buy a new motherboard.  Linux
    1.2.8 now runs well.  I've never figured out why the old one caused
    random SIGVEC errors, which is a shame, because that problem is
    probably still biting others out there (several of whom may have
    given up in disgust).

    So, now that 1.2.x is stable, I've decided to upgrade to 2.0.x.
    Unfortunately, a new "hardware" bug has appeared:  I can't
    decompress the 2.0.x kernel from the installation disk because,
    based on what I have been told (via this mailing list) and have
    since tested, the decompression code is incompatible with the
    internal cache on my new motherboard.  If I turn off internal
    cache, the kernel decompresses.

    You might say "So?  Disable the cache and be done with it."  I
    would reply that internal cache is a fairly standard feature these
    days, and I'm pretty sure that Linux runs on other systems that
    have it.  I believe that the fact that the decompression fails on a
    small subset of systems is sufficient to justify investigation of a
    possible bug.

: Pardon me for saying so, but please try to be a bit more
: constructive. This is an all volunteer unpaid project.

That was low.

It implies that I've never contributed, never tried to help anyone
else, never volunteered myself, and never spent any time trying to fix
problems for myself or anyone else on the net.  All false.  Perhaps I'm
not the kernel-hacker guru that you are, but I do what I can.

In this case, the message to which you replied included my best attempt
to help someone else with whom I shared a common problem.  It also
included a warning regarding obscure hardware incompatibility bugs.  I
believe that the kernel decompression problem is such a bug, though I'm
certainly willing to be proven incorrect.

My warning, just to be clear, is this:

These problems do exist, but it is unlikely that they will be resolved
unless the people who *don't* experience them take them seriously.  In
most cases hardware will be blamed, even though the problem appears on
a variety of memory/CPU/motherboard/add-on configurations, and the same
configurations can run other OSs (*including* older versions of Linux)
without the same failures.

In effect, it's a warning that we *have* to be involved or we may find
ourselves cut out, as I was.  Quite the opposite of your accusation.

I hope that that clarifies my position.

Sincerely,


Chris Hertel -)-----


-- 
Christopher R. Hertel -)-----                   University of Minnesota
crh@nts.umn.edu              Networking and Telecommunications Services



Reply to: