>>>>> "Graham" == Graham Whaley <graham.whaley@gmail.com> writes: > On 4 November 2013 07:18, David Kuehling <dvdkhlng@posteo.de> wrote: >>>>>> "Andreas" == Andreas Barth <aba@ayous.org> writes: >> * David Kuehling (dvdkhlng@posteo.de) [131103 16:00]: >>> Since then I've encountered system deadlocks every one-two days (the >>> system in question is running continuously 24/7). Deadlock meaning, >>> that the system does seems completely dead, even num-lock LED cannot >>> be toggled any more (but fan is still spinning etc.). >>> >>> I never had stability problems on kernel 2.6.39. I did have a >>> single deadlock when testing the debian-backports kernel package for >>> kernel 3.2.0 on debian squeeze (but I ran that kernel only for about >>> 2 days before upgrading to Wheezy). >> Can you try the old kernel if it happens with the old kernel and new >> userland? [..] > As to the instabilities - it occurs to me if this may be connected to > the Loongson 2f 'issues', as documented at [1] I believe (and btw, > would love if somebody could confirm and point me at any archive > links) that Debian-mips moved from MIPSI to MIPSII ISA when it when > from Squeeze to Wheezy. I'm wondering if maybe that change in code > layout may have bought one of these issues to the surface? Or maybe > that your kernel or RFS needs to be built with the options listed in > the link, and you've been "lucky" so far? As far as I can find out, > there is no easy way (apart from maybe looking at the top of the chip > :-( ) to tell if you have a 2F01, 2F02 or 2F03 version of the 2F SoC, > and only the 2F03 is 'fixed' :-( Anybody know for sure? I'm sure this > has probably been discussed before in the past. > Please feel free to educate me on if any of these 2F fixes are turned > on by default for upstream Debian. I doubt they are? And sorry if I've > missed some subtlety here? Hi Graham, I had the same thought - the lockups certainly look similar to what I experienced when running a Linux kernel compiled from source without the Loongson2f instruction fixes enabled in the kernel config (that would indicate I have one of the older SoCs). Looking at the output of objdump -D libc.so, it looks to me like the correct "fixed" NOP sequence is used (shown by the disassembler as "move at,at", which is synonomous for "or at,at,zero). So Debian userspace looks like it's loongson compatible. Running objdump -D on the 3.2 kernel image (that's a gzip compressed image, so I guess the code I see is only a small bootstrap sequence for ungzipping the rest) I can see the right NOP sequence plus the extra code in front of indirect jumps (e.g. function return statements). This looks like being compiled with -mfix-loongson2f-jump plus -mfix-loongson-nop. So far this looks good. Hopefully we're not hitting new, undocumented CPU bugs here. After finally getting update-initrd to build a working image for my 2.6.39.4 kernel, I'm now back to running the same kernel I used a long time with squeeze. If the lockups don't happen until the end of week we'll have another data point. With 2.6.39.4 BTW I'm not using the loongson2f optimized libc from package libc6-loongson2f (only newer kernels seem to supply the right hwcap info for ld.so to choose the loongson2f optimized versions of libraries). I'll have to run another test to see whether the loongson2f libc has anything to do WRT lockups. I noticed that going from linux 2.6.39 to 3.2, the process scheduling improved dramatically. On 2.6.39 'nice' values seem to be ignored, and output of 'top' often looks wrong, like multiple processes using exactly the same CPU amount, without any variation. Maybe newer loongson2f kernels changed to using a more accurate clock source for process CPU usage accounting. These changes could also be a source for deadlocks. Hopefully I'll not have to bisect all linux versions before 3.2 to finally solve the issue. cheers, David -- GnuPG public key: http://dvdkhlng.users.sourceforge.net/dk2.gpg Fingerprint: B63B 6AF2 4EEB F033 46F7 7F1D 935E 6F08 E457 205F
Attachment:
pgpCShST4fmNo.pgp
Description: PGP signature