[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#976922: systemd-bootchart: FTBFS on ppc64el: dh_auto_test: error: make -j160 check VERBOSE=1 returned exit code 2



On 11/12/20 at 09:09 +0100, Lucas Nussbaum wrote:
> On 10/12/20 at 22:12 +0100, Michael Biebl wrote:
> > Am 10.12.20 um 22:10 schrieb John Paul Adrian Glaubitz:
> > > Hi Michael!
> > > 
> > > On 12/10/20 8:42 PM, Michael Biebl wrote:
> > > > ============================================================================
> > > > Testsuite summary for systemd-bootchart 233
> > > > ============================================================================
> > > > # TOTAL: 1
> > > > # PASS:  1
> > > > # SKIP:  0
> > > > # XFAIL: 0
> > > > # FAIL:  0
> > > > # XPASS: 0
> > > > # ERROR: 0
> > > > ============================================================================
> > > 
> > > Did the test machine you used actually have that many cores?
> > 
> > No idea
> 
> I tried building with SMT off (so the machine only has 20 visible
> cores). I could reproduce the failure. (I disabled SMT at runtime using
> ppc64_cpu --smt=off)
> 
> It also fails when running 'make check VERBOSE=1' (so it's not caused by
> parallelism).
> 
> It crashes with:
> (gdb) r -o /tmp/tmp.k64Np1I2cr -n 10 -r -p
> Starting program: /root/systemd-bootchart-233/systemd-bootchart -o /tmp/tmp.k64Np1I2cr -n 10 -r -p
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib/powerpc64le-linux-gnu/libthread_db.so.1".
> 
> Program received signal SIGSEGV, Segmentation fault.
> 0x0000000100004d04 in svg_ps_bars (interval=<optimized out>, graph_start=2775.3698308829998, ps_first=0x1000505d0, n_cpus=1, 
>     n_samples=10, head=0x100050770, of=0x1000503f0) at src/svg.c:1187
> 1187	        i = ps->sample->next->sampledata->counter;
> (gdb) bt
> #0  0x0000000100004d04 in svg_ps_bars (interval=<optimized out>, graph_start=2775.3698308829998, ps_first=0x1000505d0, 
>     n_cpus=1, n_samples=10, head=0x100050770, of=0x1000503f0) at src/svg.c:1187
> #1  svg_do (overrun=0, interval=<optimized out>, log_start=2775.3698308829998, graph_start=2775.3698308829998, n_cpus=1, 
>     pscount=11940, n_samples=<optimized out>, ps_first=<optimized out>, head=0x100050770, 
>     build=0x1000509e0 "Debian GNU/Linux bullseye/sid", of=0x1000503f0) at src/svg.c:1371
> #2  main (argc=<optimized out>, argv=<optimized out>) at src/bootchart.c:497
> 
> The segfault can be reproduced on Debian stable outside the chroot, just
> running '/lib/systemd/systemd-bootchart -n 1'.
> 
> At the same time, 'systemd-analyze plot' works fine (I'm attaching its
> output in case it hints at something).

I tried to reproduce this on another system (ARM64, 256 visible cores
because 2 x ThunderX2, 32 cores/cpu, 4 threads/core) and it also
segfaults.

I tried to reproduce on yet another system (x86_64, 128 visible cores
because 4x Intel Xeon Gold 6130, 16 cores/cpu, 2 threads/core), and it
also segfaults.

My guess would be that the test suite depends on the system it is
running on (it just generates the output for the local system), and
there's something it doesn't like about large numbers of cores, but
there's nothing specific to ppc64el here.

Lucas


Reply to: