[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: rv-manda-01 might have hardware issues?



On 2023-08-03 12:07, Aurelien Jarno wrote:
> On 2023-08-03 13:01, Adrian Bunk wrote:
> > On Thu, Aug 03, 2023 at 11:12:49AM +0200, Aurelien Jarno wrote:
> > > On 2023-08-02 18:32, Adrian Bunk wrote:
> > > > Hi,
> > > > 
> > > > while there is a (rare)
> > > >   semop(1): encountered an error: Invalid argument
> > > > error that happens on all buildds, rv-manda-01 seems
> > > > to have issues unique to this buildd:
> > > > https://buildd.debian.org/status/fetch.php?pkg=softhsm2&arch=riscv64&ver=2.6.1-2.1&stamp=1690878571&raw=0
> > > > https://buildd.debian.org/status/fetch.php?pkg=mmseqs2&arch=riscv64&ver=14-7e284%2Bds-2&stamp=1690917698&raw=0
> > > > https://buildd.debian.org/status/fetch.php?pkg=ocaml-dune&arch=riscv64&ver=3.9.1-1%2Bb1&stamp=1690975265&raw=0
> > > > https://buildd.debian.org/status/fetch.php?pkg=libint&arch=riscv64&ver=1.2.1-6&stamp=1690989462&raw=0
> > > > 
> > > > This happens only on rv-manda-01, and my guess would be that this might 
> > > > be a hardware problem (e.g. a nonworking fan).
> > > > 
> > > 
> > > This is unfortunately not limited to rv-manda-01 and also appeared on
> > > the other buildds, so i really doubt its a hardware issue:
> > > 
> > > https://buildd.debian.org/status/fetch.php?pkg=freewnn&arch=riscv64&ver=1.1.1%7Ea021%2Bcvs20130302-7&stamp=1690541230&raw=0
> > > https://buildd.debian.org/status/fetch.php?pkg=vnlog&arch=riscv64&ver=1.36-2&stamp=1690741628&raw=0
> > > https://buildd.debian.org/status/fetch.php?pkg=audit&arch=riscv64&ver=1%3A3.1.1-1%2Bb1&stamp=1690705512&raw=0
> > > https://buildd.debian.org/status/fetch.php?pkg=libnl3&arch=riscv64&ver=3.7.0-0.2&stamp=1690668687&raw=0
> > > https://buildd.debian.org/status/fetch.php?pkg=globus-authz&arch=riscv64&ver=4.6-2&stamp=1690813082&raw=0
> > > https://buildd.debian.org/status/fetch.php?pkg=krb5&arch=riscv64&ver=1.20.1-2%2Bb1&stamp=1690796233&raw=0
> > >...
> > 
> > These are the semop(1) issue, which as I said happens on all buildds.
> > 
> > 
> > rv-manda-01 had mysterious FTBFS that did not appear when the package 
> > was retried:
> > 
> > ocaml-dune:
> > ...
> > cd _boot && /usr/bin/ocamlopt.opt -c -g -no-alias-deps -w -49-6 -alert -unstable -I +threads dune_rules__Coq_stanza.mli
> > Fatal error: exception Failure("lexing: empty token")
> > ...
> > 
> > proj:
> > ...
> > In file included from /usr/include/features.h:490,
> >                  from /usr/include/errno.h:25,
> >                  from /<<PKGBUILDDIR>>/src/projections/wag3.cpp:3:
> > /usr/include/riscv64-linux-gnu/bits/stdio2.h:244:14: error: expected string-literal before ‘^=’ token
> >   244 | extern char *__REDIRECT (__fgets_unlocked_alias,
> >       |              ^~~~~~~~~~
> > ...
> > 
> > 
> > rv-manda-01 also had several cases of the kind of gcc ICEs that are
> > clear buildd problems:
> > 
> > mmseqs2 (similar in softhsm2 and libint):
> > ...
> > /usr/include/c++/13/bits/stl_algo.h:1830:5: internal compiler error: in add_regs_to_insn_regno_info, at lra.cc:1502
> > ...
> > The bug is not reproducible, so it is likely a hardware or OS problem.
> > ...
> 
> Ok, it am afraid that we just have to shutdown this buildd and wait for
> new hardware to be available.

Alternatively I wonder if it could be the following issue, that never
get solved, and can appear or disappear depending on the random values
used by the kernel:

https://yhbt.net/lore/all/20200710191250.GA2242132@aurel32.net/T/

At least the 0d cause in the logs matches:

2023-07-27T10:54:28.413428+00:00 rv-manda-01 kernel: [141145.816839] do_trap: 1 callbacks suppressed
2023-07-27T10:54:28.414303+00:00 rv-manda-01 kernel: [141145.816858] clang++[341018]: unhandled signal 11 code 0x1 at 0x0000004a98f80000 in libgcc_s.so.1[3fb7629000+19000]
2023-07-27T10:54:28.424752+00:00 rv-manda-01 kernel: [141145.830866] CPU: 2 PID: 341018 Comm: clang++ Not tainted 6.3.0-1-riscv64 #1  Debian 6.3.7-1
2023-07-27T10:54:28.424805+00:00 rv-manda-01 kernel: [141145.839264] Hardware name: SiFive HiFive Unmatched A00 (DT)
2023-07-27T10:54:28.437684+00:00 rv-manda-01 kernel: [141145.844901] epc : 0000003fb76388f4 ra : 0000003fb7638606 sp : 0000003fe98b2910
2023-07-27T10:54:28.437756+00:00 rv-manda-01 kernel: [141145.852192]  gp : 0000002ab7b67e60 tp : 0000003fb74f4c40 t0 : 0000003fb7644000
2023-07-27T10:54:28.444988+00:00 rv-manda-01 kernel: [141145.859500]  t1 : 0000002af7d44980 t2 : 0000000000000009 s0 : 0000003fe98b2e50
2023-07-27T10:54:28.459569+00:00 rv-manda-01 kernel: [141145.866799]  s1 : 0000003fe98b2970 a0 : 0000000000000000 a1 : 0000003fe98b2840
2023-07-27T10:54:28.459627+00:00 rv-manda-01 kernel: [141145.874079]  a2 : 0000002af7d44968 a3 : 0000000000000000 a4 : 0000004a98f80000
2023-07-27T10:54:28.474985+00:00 rv-manda-01 kernel: [141145.881391]  a5 : 0000000000000893 a6 : 0000003fb79d6140 a7 : 0000000000000001
2023-07-27T10:54:28.475051+00:00 rv-manda-01 kernel: [141145.888730]  s2 : 0000003fe98b3260 s3 : 0000000000000000 s4 : 0000003fb75b6bb8
2023-07-27T10:54:28.481939+00:00 rv-manda-01 kernel: [141145.895998]  s5 : 0000003fe98b3418 s6 : 0000003fe98b2340 s7 : 0000000000000005
2023-07-27T10:54:28.495767+00:00 rv-manda-01 kernel: [141145.903288]  s8 : 0000000000000004 s9 : 0000003fb7643270 s10: 0000000000000022
2023-07-27T10:54:28.496121+00:00 rv-manda-01 kernel: [141145.910571]  s11: 0000003fe98b3ca8 t3 : 0000003fb75e2a66 t4 : 000000000007a056
2023-07-27T10:54:28.516263+00:00 rv-manda-01 kernel: [141145.917872]  t5 : 00000000000006f0 t6 : 0000000000724eee
2023-07-27T10:54:28.516333+00:00 rv-manda-01 kernel: [141145.923257] status: 0000000200004020 badaddr: 0000004a98f80000 cause: 000000000000000d

Cheers
Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                     http://aurel32.net


Reply to: