[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: 5.10.0-4-sparc64-smp #1 Debian 5.10.19-1 crashes on T2000



Hi,

can anyone possible give a list of known stable kernel versions for SPARC machines? (is there a difference necessary between architectures/old vs. newer machines? sun4u/sun4v)?

Also this instability manifests such that the machine is crashing during high workload? (halting? rebooting?)

I ask, because on three different SPARC machines i have been experiencing a weird effect when using debian:
I would start a high compiling load for several days (7-10) where the machines are running fine without any apparent error visible in dmesg or somewhere else.
Then when i power off tand on again, the filesystem would be corrupt and sometimes impossible to repair without reinstallation.

This seems to only happen when the machines do a long run with high workload and seemingly not when i just power them off again for night with no high workload.

Regards,
Connor


On Tue, Mar 23, 2021 at 4:46 PM Frank Scheiner <frank.scheiner@web.de> wrote:
Hi Jan,

On 23.03.21 16:36, Jan Engelhardt wrote:
> On Tuesday 2021-03-23 16:29, Frank Scheiner wrote:
>> ```
>> [...]
>> Begin: Retrying nfs mount ... [   41.753937] NFS: mount program didn't
>> pass remote address
>> mount: Invalid argument
>
> I seem to recall that NFS is one of those filesystems that (a) makes use of
> filesystem-specific data, i.e. mount(2)'s 5th argument, and (b) a mount helper,
> /usr/sbin/mount.nfs.
>
> Now, with the change in Linux kernel 028abd9222df0cf5855dab5014a5ebaf06f90565,
> I am postulating the hypothesis that that the fs/nfs/ code for parsing this
> binary blob is no longer aware that it is being invoked in a compat32 context.

That sounds interesting. Can you perhaps post your hypothesis also in
this thread:

https://marc.info/?t=161644900600003&r=1&w=2

Maybe this gives the kernel developers some ideas.

> Since T2 systems were said to be fine and T1 not, perhaps the T1 systems in
> question were all on NFS mounts and the T2 one wasn't?

No, the T5220 was also running diskless, actually using the same root FS
as the T1000 (in form of a btrfs subvolume snapshot) plus identical
kernel and initramfs:

```
root@nfs:/srv/tftp# ls -la $( host2hex t5220 )*
lrwxrwxrwx 1 root root 35 Feb 28  2018 AC10026E ->
boot/grub/sparc64-ieee1275/core.img
lrwxrwxrwx 1 root root 38 Mar 15 18:16 AC10026E.initrd.img ->
initrd.img.5.10.0-4.debian.sid.sparc64
lrwxrwxrwx 1 root root 36 Mar 15 18:16 AC10026E.vmlinuz ->
linux.mp.5.10.0-4.debian.sid.sparc64
```

Cheers,
Frank


Reply to: