Re: mips64 assembler

To: David Daney <ddaney@caviumnetworks.com>
Cc: debian-mips@lists.debian.org, Frederick Isaac <freddyisaac@gmail.com>, gcl-devel@gnu.org, eis@blocksatpc02.upc.es, hector.oron@gmail.com
Subject: Re: mips64 assembler
From: Camm Maguire <camm@maguirefamily.org>
Date: Wed, 22 Sep 2010 17:40:42 -0400
Message-id: <[🔎] 87lj6te9t1.fsf@maguirefamily.org>
In-reply-to: <[🔎] 4C97D9A1.7050102@caviumnetworks.com> (David Daney's message of "Mon\, 20 Sep 2010 15\:01\:05 -0700")
References: <[🔎] E1OwbkA-0006gv-Bi@localhost.m.enhanced.com> <[🔎] 4C93993E.7030008@caviumnetworks.com> <[🔎] 8762y49k1k.fsf@maguirefamily.org> <[🔎] 4C93D86D.5090201@caviumnetworks.com> <[🔎] 87fwx4dwu5.fsf@maguirefamily.org> <[🔎] 4C97D9A1.7050102@caviumnetworks.com>

Greetings!

David Daney <ddaney@caviumnetworks.com> writes:

> On 09/20/2010 12:44 PM, Camm Maguire wrote:
>> David Daney<ddaney@caviumnetworks.com>  writes:
>>
>>> PLT support works with the n32 ABI (with new toolchains).  Can you use that?
>>
>> -mabi=n32 -mplt still seems to generate a .MIPS.stubs section
>>   requiring canonical gp register setting (gcc 4.4.5).  Am I missing
>>   something?
>>
>
> You may also have to specify -mno-shared.  It looks like the GCC
> documentation is foobar for this option.  At some point it started
> following -fPIC, but the documentation doesn't indicate this.
>

Still have a .MIPS.stub section and no .plt section.  Am I looking for
the wrong thing?

>
>>>
>>> I am missing part of the puzzle.  ld.so handles all of this, why can't
>>> you let it do its job?
>>>
>>
>> The general setting is that there is a fully linked executable which
>> when run, has the ability to load, relocate, and execute new code in
>> .o files.
>
> dlopen() works.  Why can't you use it?
>

1) Thousands of loads typically occur in a given session, and dlopen
consumes one file descriptor for each loaded file.

2) I cannot specify the destination address for the located code, so
that it is very difficult or perhaps impossible to preserve these
loads across unexec.

3) The loaded files typically aren't kept with the saved binary as it
is moved among machines.  

cvs gcl has a mechanism to preserve calls to symbols like 'sin' found
in libm using dlopen.  This is done by making a call through a C
pointer which is reset at startup time.  But here, there are a very
limited number of libraries opened, the libraries are ubiquitous, the
the code called does not need to be saved with unexec.

>> Furthermore, the running program can be saved to disk via
>> unexec and reexecuted later, possibly on a different machine. Calls in
>> the .o files t be loaded to symbols in shared libraries cannot be set
>> to the current address of the symbol, as this might not be persistent
>> across image saves and reexecution.  Relocating instead to a
>> preexisting stub in the base executable takes advantage of ld.so's
>> lazy relocation on first execution, and, as the target address lies in
>> the image itself, is persistent across image saves.
>>
>
> unexec is very tricky indeed.  I haven't tried to build an n32 version
> of emacs.  I should try it.  The last time I looked emacs used unexec.
>

This is working on mips32 for gcl/acl2/maxima/hol88/axiom.

>>
>>>
>>>> This seems to indicate to me that I will need to craft my own lazy
>>>> relocation stub for each call to a shared lib symbol at the end of
>>>> each loaded block of code.  Then I can mode the gp pointer to a local
>>>> .got table as well.  This is unfortunate, but can be done.  Two
>>>> questions remain:
>>>>
>>>> 1) Is there an alternative, e.g. some flag like -mplt to generate a
>>>> genuine .plt section in the base executable, or other way out?
>>>>
>>>
>>> You haven't specified at a high level what problem you are trying to solve.
>>>
>>
>> 1) If I am to make use of the base executable stub to say _setjmp, I
>> have to leave the gp pointer in its canonical position in the newly
>> loaded code, because the format of the .MIPS.stub (in contrast to the
>> .plt stub elsewere) requires this.
>>
>> 2) Therefore all .got references in the newly loaded code have to
>> exist in the .got table of the base executable, thereby excluding
>> addresses in the newly loaded code.
>>
>
> This I don't understand.  Each function conceptually has its own GOT
> although in practice many of them are merged together.  So in a
> running program there will be several GOTs  (a minimum of one for the
> executable and one for each shared library loaded)  The function
> prolog loads the gp if it will use it.  The use of -mplt may slightly
> change the mechanism (I haven't looked at it for quite a while), but
> really I think the notion of a canonical gp
>
>

Were you trying to finish a sentence here?  (I'd love to know all your
thoughts on this matter!).  I might get your gist (see below).

>
>> 3) On mips64, in contrast to mips32, I cannot overwrite .got
>> references to addresses in the newly loaded code to be immediate
>> address references instead, as it takes too many instructions.
>>
>
> The GOT is just a bunch of pointers.  If you can overwrite them in the
> o32 ABI, I don't understand why you cannot do the same for n32/n64.
>

I meant overwrite register loads from the got with register loads from
an immediate value.

> Also if you run with LD_BIND_NOW the lazy binding stubs are never
> used, the GOT will be fully populated by ld.so when the program
> starts.

This actually is a very useful piece of info -- thanks!  

This obviously frees me from having to worry about the stub, but I'm
not sure if it allows me to escape from using the .got for the base
executable.  This .got is guaranteed to be handled by ld.so on
startup, either immediately or lazily.  Any .got I craft and append to
my loaded code will not, unless I can point the executable header to
this region somehow.

For example, say I load code that calls _setjmp.  If I use the
existing .got, even if populated immediately, I know that if my code
is dumped, and the binary moved to another machine, and restarted, the
new _setjmp address will be handled properly.

>
>
>
>> 4) It appears that I have three broad options:
>>
>>     a) Make my own .got table at the end of the newly loaded code, and
>>     append with my own lazy stub when necessary.  For example, on
>>     alpha, we create our own .got in this manner due to the 64bit
>>     issue, but we don't have to make our own stub as the alpha has a
>>     callable .plt stub making no gp register value assumptions.
>>
>>     b) Do a) above but get a working .plt with some compiler flag
>>     settings, obviating the need to a local stub.
>
>
>>
>>     c) find some other way, perhaps with compiler flags, to eliminate
>>     .got references to local addresses in the newly loaded code.  In
>>     other words, if I could instruct gcc to write accesses to the .data
>>     section of the newly loaded code as a 32bit offset from the .text
>>     section address, instead of a .got load and offset, I'd be set.
>
>
> Not possible.  There is no pc relative addressing mode.
>

Good to know, thanks!

>>
>> [ e.g.
>>
>> 0000000000000000<init_code>:
>>     0:	67bdffe0 	daddiu	sp,sp,-32
>>     4:	ffbf0010 	sd	ra,16(sp)
>>     8:	ffbe0008 	sd	s8,8(sp)
>>     c:	ffbc0000 	sd	gp,0(sp)
>>    10:	03a0f02d 	move	s8,sp
>>    14:	3c1c0000 	lui	gp,0x0
>>    18:	0399e02d 	daddu	gp,gp,t9
>>    1c:	679c0000 	daddiu	gp,gp,0
>>    20:	df820000 	ld	v0,0(gp)<-- data address page load, cannot be written as lui on 64bit
>
> No it cannot, but why can't you populate the GOT/PLT with the address
> as the standard ABIs do?  I know I have asked this in several
> different forms, so please be patient...
>

So far, gp is at its canonical value.  This lets me use the _setjmp
entry of the base executable handled transparently by ld.so.  The
address I need is in the code to be loaded (the top of its .data
section).  It obviously cannot be in the .got of the base executable,
as it is *new* code.

On other machines with a .plt (e.g. alpha), I don't leave the gp at
its 'canonical' value, but rather set it to a mini-table I craft at
the end of the code to be loaded.  I then populate this .got
accordingly.  The _setjmp address I use is the address of the .plt
entry.  This will always call the .plt entry and never reset the new
.got slot, as the .plt is designed to set the .got slot of the base
executable.  So the call is somewhat inefficient, but it works and is
stable.   On mips, if I move the gp pointer to my mini-table, it will
no longer be correct in the stub where it is used to lookup the lazy
relocator of libdl in the .got of the base executable.

So in sum, it seems that if I can get a .plt, all I need is a local
.got.  Otherwise, I need a local .got, plus a stub for each call to an
external symbol like _setjmp, plus some means of resetting the new
.got entry to the lazy relocator at each image execution.  Right?

Actually, one better idea has just come to mind.  The new _setjmp stub
should rather reload the old (canonical) gp pointer, then do a .got
call to the _setjmp entry in the .got table of the base executable.
This is cumbersome, but at least then I don't have to mess with issues
regarding image startup and ld.so.  This is akin of course to making
my own .plt for the symbol.

Thoughts?

Take care,

-- 
Camm Maguire			     		    camm@maguirefamily.org
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah

Reply to:

Follow-Ups:
- Re: mips64 assembler
  - From: David Daney <ddaney@caviumnetworks.com>

References:
- mips64 assembler
  - From: Camm Maguire <camm@maguirefamily.org>
- Re: mips64 assembler
  - From: David Daney <ddaney@caviumnetworks.com>
- Re: mips64 assembler
  - From: Camm Maguire <camm@maguirefamily.org>
- Re: mips64 assembler
  - From: David Daney <ddaney@caviumnetworks.com>
- Re: mips64 assembler
  - From: Camm Maguire <camm@maguirefamily.org>
- Re: mips64 assembler
  - From: David Daney <ddaney@caviumnetworks.com>

Prev by Date: Re: mips64 assembler
Next by Date: Re: Bug#598234: emacs23-nox: emacs fails to install on mipsel
Previous by thread: Re: mips64 assembler
Next by thread: Re: mips64 assembler
Index(es):
- Date
- Thread