[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Bug#858405: xmlto: intermittent Segmentation fault when building manpages for libreswan on mips64el



reassign 858405 xsltproc
forcemerge 750593 858405
retitle 750593 xsltproc: bus error on some arches with linux < 4.1
thanks

Hi,

On 22/03/17 21:01, Daniel Kahn Gillmor wrote:
> On Wed 2017-03-22 06:22:41 -0400, James Cowgill wrote:
>> On 22/03/17 01:29, Daniel Kahn Gillmor wrote:
>>> For debian revisions of 3.20, failures happened on:
>>>
>>>   mipsel-manda-02
>>>   eberlin
>>>         
>>> Also for revisions of 3.20, successes happened on:
>>>
>>>   mipsel-sil-01
>>>   mipsel-manda-03
>>>   mipsel-manda-01
>>
>> This is a known issue and it only affects Loongson buildds.
>> Interestingly mipsel-manda-01 is Loongson and didn't fail there so there
>> may be a random element involved here. I don't think anyone's tracked
>> down the underlying issue though.
> 
> thanks, is there a public reference for the known issue that we can
> point to?

I think #750593 looks a lot like the bug here.

After some investigation, it seems I was being a bit unfair to Loongson.
This is arguably a non mips specific bug in Linux < 4.1. It just so
happens that all the Loongson buildds run jessie's 3.16 kernel and all
the other buildds run >= 4.7 from backports.

In #750593 there was lots of talk about stack overflows causing this but
there is actually another element to this. Indeed if I reduced the stack
size down with ulimit, the segfaults become more frequent, but
increasing the stack size didn't help at all. After looking at the
mappings for a failing process, I saw this (taken just after starting
xsltproc):

[...]
> fff7f50000-fff7f5c000 ---p 00004000 fd:00 1060250                        /usr/lib/mips64el-linux-gnuabi64/libeatmydata.so.1.1.2
> fff7f5c000-fff7f60000 rw-p 00000000 fd:00 1060250                        /usr/lib/mips64el-linux-gnuabi64/libeatmydata.so.1.1.2
> fff7f60000-fff7f88000 r-xp 00000000 fd:00 1060375                        /lib/mips64el-linux-gnuabi64/ld-2.24.so
> fff7f94000-fff7f98000 rw-p 00024000 fd:00 1060375                        /lib/mips64el-linux-gnuabi64/ld-2.24.so
> fff7f98000-fff7fa0000 r-xp 00000000 fd:00 947544                         /usr/bin/xsltproc
> fff7fa4000-fff7fac000 rw-p 00000000 00:00 0
> fff7fac000-fff7fb0000 rw-p 00004000 fd:00 947544                         /usr/bin/xsltproc
> ffff1d4000-ffff384000 rwxp 00000000 00:00 0                              [heap]
> ffff9e0000-ffffa04000 rwxp 00000000 00:00 0                              [stack]
> ffffffc000-10000000000 r-xp 00000000 00:00 0                             [vdso]

Notice that there is a very small gap between the heap and the stack
here (at least compared to working xsltproc runs). I think that the heap
is growing to a point where it limits the maximum size of the stack and
so increasing the stack size with ulimit doesn't help.

The reason the program and the heap are at these very high addresses is
that xsltproc is built with PIE and the kernel is treating the
executable like a mmap and grouping it with all the other libraries. In
d1fd836dcf00 ("mm: split ET_DYN ASLR from mmap ASLR") the behavior
changed and now the program and it's heap will be mapped at a lower
address so the bug does not affect newer kernels. Using "setarch -L" or
"setarch -R" is another workaround for this bug because that moves the
program so that there is a much larger gap between the heap and the stack.

This might affect other applications as well. Effectively it means that
PIE executables which use lots of stack space might not work properly
with jessie's kernel. The chances the bug will be hit seems to vary
between arches however (depending on what each arch does in
arch_pick_mmap_layout and arch_randomize_brk) - mips64el seems to be hit
pretty frequently. In xsltproc's case, PIE was enabled some time ago
which is why this bug is quite old.

I believe any of the following will fix this (but have not all been tested):
- Reduce the stack usage in xsltproc (the upstream bug)
- Upgrade the relevant buildds to Linux >= 4.1
- Apply d1fd836dcf00 to jessie's kernel
- Disable PIE in xsltproc.
- Run xsltproc inside setarch -L / setarch -R

>> For the moment, I'll rebuild libreswan again and hope a good buildd is
>> picked.
> 
> i see 5 mips64el rebuilds now at
> https://buildd.debian.org/status/logs.php?pkg=libreswan&ver=3.20-6&suite=sid,
> but none of them have succeded yet :/
> 
> 3 of the builds are from mipsel-manda-02, 1 is from eberlin, and one
> additional new "bad" builder is:
> 
>  	mipsel-aql-01

There are 3 non-Loongson buildds: mipsel-aql-03, mipsel-manda-03 and
mipsel-sil-01. I expect libreswan will only build on one of those
buildds at the moment.

Thanks,
James

Attachment: signature.asc
Description: OpenPGP digital signature


Reply to: