Bug#469058: DF and signal handlers
Nikodemus Siivola a ?crit :
> On 3/5/08, Aurelien Jarno <aurelien at aurel32.net> wrote:
>> Nikodemus Siivola a ?crit :
>>> On 3/5/08, Debian Bug Tracking System <owner at bugs.debian.org> wrote:
>> >> tag 469058 + patch
>> >> Bug#469058: sbcl doesn't reset direction flag upon exit
>> >> There were no tags set.
>> >> Tags added: patch
>> > Thanks for the patch, but... while I agree that it is good to change
>> > SBCL to reset the direction flag every time it is diddled, instead of
>> > just before calling C, I don't think SBCL is actually at fault here.
>> > 1. SBCL does actually reset DF before any call to foreign (GCC generated) code.
>> > See line 236 in src/compiler/x86/c-call.lisp, and line 125 in
>> > src/runtime/x86-assem.S.
>> > (It is possible I'm missing out a call-path here, but even so, read on and
>> > see if my fears are unfounded or not.)
>> > 2. If the problem was due to a foreign call, it should be deterministic.
>> > 3. If the problem was due to _returning_ to main(), it should be deterministic.
>> Looks correct.
>> > What I suspect is actually going on (especially considering your
>> > statement that compiling signals/ with 4.2 avoided the issue) is that
>> > a signal handler is entered while DF is set.
>> What I am sure is that sigemptyset() from the glibc is called with the
>> direction flag set, and that should not happen.
> I'm about to merge a patch to SBCL based on yours, which moves all DF
> resets to immediate vicinity of STDs for easier auditing, and removed
> the then-unnecessary CLD instructions from foreign call sequences.
> This will fix them symptoms, and be good for SBCL, but I think the
> underlying problem is still there in signal handling. :/
>> > If this is the case, then clearing it right after each REP loop where
>> > SBCL uses it just makes seeing the bug much more unlikely -- but not
>> > impossible in the presence of async signals.
>> Seems correct, though I have made half a dozen of build here, without
>> any problem.
> That is not too suprising: the are normally no asynch signals
> delivered during the build, but SIGSEGV is a regular occurance (it is
> used by the GC), so SIGSEGV handlers may have been seeing the DF set.
> What _is_ strange is that this appears to have been random. (At least
> all the reporters seemed to characterize it as semirandom behaviour.)
> Multiple builds from the same source with the same host compiler
> should have essentially identical GC characteristics.
Well it may depends on the kernel. On one machine, it was hanging
randomly. On another machine, I get an error from GC at the very
beginning of the build.
>> > If so, this may also explain some _very_ hard to reproduce faults we
>> > have seen over the years: using a pre 4.3-GCC compiled libc, a signal
>> > at an in opportune moment in the middle of a REP loop could clear DF!
>> > Yikes!
>> > I'm not sure what is The Right Thing here, though. Should SBCL (and
>> > _any_ program that ever sets DF!) save, clear, and restore DF in its
>> > signal handlers? Should libc/kernel do that? Should signals be blocked
>> I currently have no idea about that.
> I'll see if I can cook up a small test-case using async signals. (One
> that doesn't need SBCL so that it can be passed to upstream libc /
> kernel people if necessary without too much friction.)
GCC developer says it's the job of the kernel. I doubt the glibc can do
something here, that's the kernel which calls the signal handler.
.''`. Aurelien Jarno | GPG: 1024D/F1BCDB73
: :' : Debian developer | Electrical Engineer
`. `' aurel32 at debian.org | aurelien at aurel32.net
`- people.debian.org/~aurel32 | www.aurel32.net