[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

RE: mprotect-4 hang on Debian (and other?) implementations



>Running the binary test suite on Debian gives a reproducible hang in
>LSB.os/mprotect/mprotect_P/T.mprotect_P-4 -- it leaves two processes
>busy looping that'll only die (for me, anyway) with a SIGKILL.

I see this hang when running the ia32 Sample Implementation
inside a chroot. On the same platform, running natively,
there is no problem, which has been horribly confusing to me.
I can't see any kernel or library issues that account for this,
so maybe it's just "by chance" depending on what existing
mmaps are set up, or what addresses things happen to fall on,
or whatever...  I'd certainly be glad to see this explain
away the failure, then I'd be down to one :-)

These tests are a little unusual; usually it seems that all this
kind of work is done in a child process to protect against
interactions of one test with another.  This set does not seem
to follow that scheme.

> The problem seems to be the calls to mmap and munmap in test3():
> vsrt_pgsz bytes are mmap'ed, then 3*vsrt_pgsz bytes are munmap'ed --
> and thus anything mmap'ed in the 2*v_p bytes after our block of memory
> gets munmap'ed too, and segfaults and suchlike result on accesses to
> it. Presumably the sig{set,long}jmp used to check the behaviour thus
> causes infinite loops. test4(), by contrast, mmap's more than 
> it munmap's. No idea if this actually causes problems.

Let me add that I was rooting around in this set of tests
tracking down Itanium-related problems.  Those were caused by
a different set of assumptions, but at the same time I was
confused by the code that figures out how many vsrt_pgsz pages
are going to be used.  The calculation I'm confused by appears
in a utility routine vsrt_create_exec_file in SRC/common/vsrtlib/mkmmap.c
which just happens to be the routine mprotect test4() calls
to get its' world set up.

It *looks* like the algorithm tries to make sure it's using the 
page the code is in, the next page (in case the function code 
overflows into it, I suppose), and if possible, the page before 
(thus three pgs instead of two). I think this calculation of 
the "previous page" is questionable. Paraphrasing:

address = base of page containing target function
if address > 0
   if address > pagesize
       use 3 pages
       address -= pagesize
   else
       use 2 pages
else
   use 2 pages

On a platform where the base address of text is non-zero,
those tests will both always be true, yet if the target
function somehow appeared in the first page of that segment, 
that would leave address pointing before the beginning of 
the text segment after it's decremented, leading to a 
segfault.  This is not impossible with some implementations 
considering 64k and even larger pagesizes.

I suspect the second test should be

   if address > (base of segment) + pagesize

and maybe the first should be if > base ....



Reply to: