[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#993948: kernel/amd64: system hang on HPE ProLiant BL460c Gen9



On Fri, 10 Sep 2021 09:40:41 +0800 YunQiang Su <wzssyqa@gmail.com> wrote:
> Yunqiang Su <wzssyqa@gmail.com> 于2021年9月9日周四 上午11:11写道:
> >
> >
> > On Wed, 8 Sep 2021 20:53:27 +0800 YunQiang Su <wzssyqa@gmail.com> wrote:
> > > Package: src:linux
> > > Version: 5.10
> > >
> > > After upgrade to bullseyes' kernel, the system always hang after about 10 min
> > > with an error from IML log
> > >
> > > An Unrecoverable System Error (NMI) has occurred (Service Information:
> > > 0x00000008, 0x89480000)
> > >
> > > Kernel 5.14 from experimental also has this problem.
> > > Kernel 4.19 works fine.
> > > Fedora 34 seems to be working well.
> >
> > This is the output of dmesg and lspci from both Fedora 34 and Debian bullseye.
> > Wish they are useful.
> >
> 
> Finally, we find the problem:
> 
> https://github.com/torvalds/linux/commit/8343b1f8b97ac016150c8303f95b63b20b98edf8
> https://github.com/torvalds/linux/commit/65161c35554f7135e6656b3df1ce2c500ca0bdcf
> 
> In the first patch:
>    They thought `err' is not used at all, and removed it.
> In the second patch:
>    They add it back and a wrong value "-EINVAL" is given.
> 
> Better KPI got.
> 

The NICs can be detected now, while the machine continue to hang…
4.19.y works fine, while 5.10, 5.14 cannot.

I think that we need more dig.

> > >
> > > --
> > > YunQiang Su
> > >
> > >
> 
> 
> 
> -- 
> YunQiang Su
> 
> 


Reply to: