[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

E250 crashing hard with 2.4.16 and 2.4.17



Hello,

once I had hardware failures on my E250 server which I was able to
reproduce also under Solaris.  This enabled me to bother the Sun
support.  Now I am in trouble because the problem only occures under
Linux :-(.

After the crash I had to switch on the machine manually.  A can't to
anything via serial line (seyon).  It does not take any signal (at least
'Break' does not work and I did not found any other signal to realive
the box).

The test program is fairly simple:

    #!/bin/bash
    i=0
    while [ $i -lt 1000 ] ; do
       echo $i
       let i="$i+1"
       dd if=/dev/sda4 of=/dev/null
    done

This is the output (Sorry for the German locale.  I translate:
     dd: Reading of »/dev/sda4«: Input-/Output Error )

0
dd: Lesen von »/dev/sda4«: Eingabe-/Ausgabefehler
4191936+0 Records ein
4191936+0 Records aus
1
dd: Lesen von »/dev/sda4«: Eingabe-/Ausgabefehler
4191936+0 Records ein
4191936+0 Records aus
2
dd: Lesen von »/dev/sda4«: Eingabe-/Ausgabefehler
4191936+0 Records ein
4191936+0 Records aus
3
dd: Lesen von »/dev/sda4«: Eingabe-/Ausgabefehler
4191936+0 Records ein
4191936+0 Records aus
4
dd: Lesen von »/dev/sda4«: Eingabe-/Ausgabefehler
4191936+0 Records ein
4191936+0 Records aus
5

This was the last message from my box.

Here is the relevant part of /var/log/syslog

Feb 28 13:00:53 bse kernel: attempt to access beyond end of device
Feb 28 13:00:53 bse kernel: 08:04: rw=0, want=2095972, limit=2095969
Feb 28 13:02:01 bse /USR/SBIN/CRON[1311]: (root) CMD (test -x /usr/sbin/logcheck && nice -n10 /usr/sbin/logcheck)
Feb 28 13:03:21 bse kernel: attempt to access beyond end of device
Feb 28 13:03:21 bse kernel: 08:04: rw=0, want=2095972, limit=2095969
Feb 28 13:05:48 bse kernel: attempt to access beyond end of device
Feb 28 13:05:48 bse kernel: 08:04: rw=0, want=2095972, limit=2095969
Feb 28 13:08:01 bse /USR/SBIN/CRON[2279]: (mail) CMD (  if [ -x /usr/sbin/exim -a -f /etc/exim/exim.conf ]; then /usr/sbin/exim -q ; fi)
Feb 28 13:08:16 bse kernel: attempt to access beyond end of device
Feb 28 13:08:16 bse kernel: 08:04: rw=0, want=2095972, limit=2095969
Feb 28 13:10:43 bse kernel: attempt to access beyond end of device
Feb 28 13:10:43 bse kernel: 08:04: rw=0, want=2095972, limit=2095969
Feb 28 14:43:50 bse syslogd 1.4.1#10: restart.
Feb 28 14:43:50 bse kernel: klogd 1.4.1#10, log source = /proc/kmsg started.
Feb 28 14:43:50 bse kernel: Inspecting /boot/System.map-2.4.17
Feb 28 14:43:50 bse rpc.statd[178]: Version 1.0 Starting


I'm running a kernel compiled by myself with ext3fs.  It happens at least
with 2.4.16 and 2.4.17.  Whatever problem /dev/sda4 has - the machine
should not crash, shouldn't it?

I tried the analogous script on /dev/sdd where I have a Solaris test
partition.  At least I tried it with a partition at /dev/sdd, because I
do not know which is the syntax under solaris to adress exactly the
physical /dev/sda4.

Any ideas to get the machine working again?
What additional information is needed to track down the problem?

Kind regards

         Andreas.



Reply to: