[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Hearsay: gcc-2.95.4 producing bogus kernels...



On Tue, Dec 25, 2001 at 11:54:54AM +0100, Matthias Klose wrote:

    searching on http://www.uwsg.indiana.edu/hypermail/linux/kernel/
    I found only the old 2.95.2 problems. Please could you point to some
    messages (URLs)?

Nothing specific that will help fix/find the bug as such I suspect,
but reports of at least one code path being executed that shouldn't be
reachable (DaveM is IMO a pretty reliable in this instance).

Attached is a thread from lk.


  --cw
--- Begin Message ---
Hi,

since some time now, running the ht://dig indexing cron job locked my
machine hard (SMP, 2x866 PIII with 1G memory, Highmem(4G) enabled),
only SysReq was still working. Yesterday i enabled it again, 
and, suprise surprise, it survived the night. 

But this messsage appears in the logs:

    Dec 22 00:01:25 obelix kernel: KERNEL: assertion (tp->copied_seq ==
    tp->rcv_nxt || (flags&(MSG_PEEK|MSG_TRUNC))) failed at
    tcp.c(1520):tcp_recvmsg

    Dec 22 00:01:53 obelix last message repeated 14 times

Note that this cronjob is started at midnight.

I'm running kernel 2.4.17rc1 on a Debian Sid/Unstable box.
The web server (running on the same machine!) is debians apache 1.3.22-2 package.

Here's the output of lspci

00:00.0 Host bridge: VIA Technologies, Inc. VT82C693A/694x [Apollo
PRO133x] (rev c4)
        Subsystem: Asustek Computer, Inc.: Unknown device 8038
        Flags: bus master, medium devsel, latency 0
        Memory at fd000000 (32-bit, prefetchable) [size=16M]
        Capabilities: [a0] AGP version 2.0
        Capabilities: [c0] Power Management version 2

00:01.0 PCI bridge: VIA Technologies, Inc. VT82C598/694x [Apollo
MVP3/Pro133x AGP] (prog-if 00 [Normal decode])
        Flags: bus master, 66Mhz, medium devsel, latency 0
        Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
        Capabilities: [80] Power Management version 2

00:04.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South]
(rev 40)
        Subsystem: Asustek Computer, Inc.: Unknown device 8038
        Flags: bus master, stepping, medium devsel, latency 0
        Capabilities: [c0] Power Management version 2

00:04.1 IDE interface: VIA Technologies, Inc. Bus Master IDE (rev 06)
(prog-if 8a [Master SecP PriP])
        Flags: medium devsel
        I/O ports at d800 [size=16]
        Capabilities: [c0] Power Management version 2

00:04.2 USB Controller: VIA Technologies, Inc. UHCI USB (rev 16)
(prog-if 00 [UHCI])
        Subsystem: Unknown device 0925:1234
        Flags: bus master, medium devsel, latency 32, IRQ 12
        I/O ports at d400 [size=32]
        Capabilities: [80] Power Management version 2
00:04.3 USB Controller: VIA Technologies, Inc. UHCI USB (rev 16)
(prog-if 00 [UHCI])
        Subsystem: Unknown device 0925:1234
        Flags: bus master, medium devsel, latency 32, IRQ 12
        I/O ports at d000 [size=32]
        Capabilities: [80] Power Management version 2

00:04.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI]
(rev 40)
        Flags: medium devsel
        Capabilities: [68] Power Management version 2

00:0b.0 SCSI storage controller: LSI Logic / Symbios Logic (formerly
NCR) 53c895 (rev 01)
        Subsystem: Tekram Technology Co.,Ltd.: Unknown device 3907
        Flags: bus master, medium devsel, latency 32, IRQ 10
        I/O ports at b800 [size=256]
        Memory at fc000000 (32-bit, non-prefetchable) [size=256]
        Memory at fb800000 (32-bit, non-prefetchable) [size=4K]
        Expansion ROM at <unassigned> [disabled] [size=64K]

00:0c.0 VGA compatible controller: S3 Inc. 86c764/765 [Trio32/64/64V+]
(rev 40) (prog-if 00 [VGA])
        Flags: medium devsel, IRQ 11
        Memory at f4000000 (32-bit, non-prefetchable) [size=64M]
        Expansion ROM at 000c0000 [disabled] [size=64K]

00:0d.0 Ethernet controller: Realtek Semiconductor Co., Ltd. RTL-8139
(rev 10)
        Subsystem: Realtek Semiconductor Co., Ltd. RT8139
        Flags: bus master, medium devsel, latency 32, IRQ 15
        I/O ports at b400 [size=256]
        Memory at f3800000 (32-bit, non-prefetchable) [size=256]
        Capabilities: [50] Power Management version 2
 
I'm using the 8139too driver built into the kernel for the Realtek card.

These modules are usually loaded:

obelix:~# lsmod
Module                  Size  Used by    Tainted: P
ide-probe-mod           8224   0  (autoclean)
ide-mod                58596   0  (autoclean) [ide-probe-mod]
ipt_TOS                 1344  13  (autoclean)
iptable_mangle          2016   0  (autoclean) (unused)
ipt_REDIRECT            1088   1  (autoclean)
ipt_MASQUERADE          1664   1  (autoclean)
iptable_nat            16756   0  (autoclean) [ipt_REDIRECT
ipt_MASQUERADE]
ipt_state                928   2  (autoclean)
ip_conntrack           16652   2  (autoclean) [ipt_REDIRECT
ipt_MASQUERADE iptable_nat ipt_state]
ipt_LOG                 3584   5  (autoclean)
ipt_limit               1344  12  (autoclean)
iptable_filter          2048   0  (autoclean) (unused)
ip_tables              11328  11  [ipt_TOS iptable_mangle ipt_REDIRECT
ipt_MASQUERADE iptable_nat ipt_state ipt_LOG ipt_limit iptable_filter]
ospm_processor          5984   0  (unused)
ospm_button             3264   0  (unused)
ospm_system             6028   0  (unused)
ospm_busmgr            11904   0  [ospm_processor ospm_button
ospm_system]
rtc                     6456   0  (autoclean)

I will try the official 2.4.17 kernel and see how it goes.

Happy Christmas to all of you !

		Stefan

-- 
Misery loves company, but company does not reciprocate.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--- End Message ---
--- Begin Message ---
What compiler are you using to build these kernels?  To be honest
the assertion you have triggered ought to be impossible and this is
the first report I've ever seen of it triggering.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--- End Message ---
--- Begin Message ---
Em Sat, Dec 22, 2001 at 03:57:13PM -0800, David S. Miller escreveu:
> What compiler are you using to build these kernels?  To be honest
> the assertion you have triggered ought to be impossible and this is
> the first report I've ever seen of it triggering.

IIRC he said he (or another guy with the same problem) was using gcc
3.0.something available in Red Hat rawhide.

- Arnaldo
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--- End Message ---
--- Begin Message ---
Arnaldo Carvalho de Melo (acme@conectiva.com.br) said: 
> Em Sat, Dec 22, 2001 at 03:57:13PM -0800, David S. Miller escreveu:
> > What compiler are you using to build these kernels?  To be honest
> > the assertion you have triggered ought to be impossible and this is
> > the first report I've ever seen of it triggering.
> 
> IIRC he said he (or another guy with the same problem) was using gcc
> 3.0.something available in Red Hat rawhide.

If it's the one in rawhide, it's 3.1-0.10, off gcc HEAD. I have yet
to see a kernel boot successfully compiled by this compiler, but
YMMV. :)

Bill
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--- End Message ---
--- Begin Message ---
Hi David!

On Sat, 22 Dec 2001, David S. Miller wrote:

> 
> What compiler are you using to build these kernels?  To be honest
> the assertion you have triggered ought to be impossible and this is
> the first report I've ever seen of it triggering.


Ok, i switched to kernel 2.4.17 and it happened again tonight.

Here's the output of ver_linux:


asterix:/usr/src/linux/scripts# ./ver_linux 
If some fields are empty or look unusual you may have an old version.
Compare to the current minimal requirements in Documentation/Changes.
 
 Linux asterix 2.4.17-a1 #1 Sam Dez 22 13:39:51 CET 2001 i686 unknown
  
  Gnu C                  2.95.4
  Gnu make               3.79.1
  util-linux             2.11n
  mount                  2.11n
  modutils               2.4.11
  e2fsprogs              1.25
  PPP                    2.4.1
  Linux C Library        2.2.4
  Dynamic linker (ldd)   2.2.4
  Procps                 2.0.7
  Net-tools              1.60
  Console-tools          0.2.3
  Sh-utils               2.0.11
  Modules Loaded         nfs lockd sunrpc parport_pc lp parport
  matroxfb_maven matroxfb_crtc2 cs46xx ac97_codec soundcore i2c-matroxfb
  i2c-algo-bit i2c-core apm ide-scsi rtc

asterix:/usr/src/linux/scripts# gcc -v
Reading specs from /usr/lib/gcc-lib/i386-linux/2.95.4/specs
gcc version 2.95.4 20011006 (Debian prerelease)


Note that the kernel was compiled on asterix (another machine).

I also installed memtest86 and ran it a few days ago.

I still need to read up on its doc. but here is one error that came up
after HOURS of memtest86 running.

	Address:	190fe988; 400.90MiB
	Good:		00100000
	Bad:		00000000
	Error-Bits:	00100000

Does it mean there's a one-bit error in the first (400.90<512) 512MB Module? 
(This machine has 1GB memory in 2x512MB modules)
If so, I will remove this module and try again with only 512 MiB
memory.


    Bye, Stefan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--- End Message ---
--- Begin Message ---
Hi,

ok, here's a followup on my own mail. I just replaced 
all the memory with the one from my workstation (256MB + 128MB)
and htdig's cron job still locks up the machine, so although
the 512MB module might have a bit-error it's NOT the cause of the 
problem here. My workstation is running about a year now with this
memory and NEVER locked up like this. 

So IMHO the problem lies somewhere else. 

Any suggestions? I'm happy to provide more information if it helps.

TIA for your effort!

    Bye, Stefan

PS: the filesystems are all ext3, in case it matters
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--- End Message ---
--- Begin Message ---
   From: Stefan Frank <sfr@gmx.net>
   Date: Sun, 23 Dec 2001 13:33:20 +0100

   Any suggestions? I'm happy to provide more information if it helps.

Try a different compiler, the one you are using is known to generate
bogus kernels.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--- End Message ---
--- Begin Message ---
Try with a different compiler, as others in this thread have noted the
compiler you are using is flakey at best.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

--- End Message ---

Reply to: