[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: mips64 assembler



David Daney <ddaney@caviumnetworks.com> writes:

> On 09/17/2010 01:44 PM, Camm Maguire wrote:
>> Greetings!
>>
>> David Daney<ddaney@caviumnetworks.com>  writes:
>>
>>> On 09/17/2010 07:16 AM, Camm Maguire wrote:
>>>> Greetings!  Is there anyway to load a known 64bit number into a given
>>>> register in two instructions?
>>>
>>> Not in the general case where the value of the 64-bit number is
>>> unconstrained...
>>>
>>>> Said number is guaranteed to be within
>>>> 32bits of the current value of another register.
>>>
>>> In other words, you want to add an arbitrary 32-bit constant to the
>>> value in a register.  You would need three instructions to do this.
>>> Two to generate the 32-bit constant and another to do the addition.
>>>
>>> David Daney.
>>>
>>
>> Alas, this was as I had expected.  Perhaps you can suggest a course of
>> action.
>>
>> On mips only, there is no plt support -- executables instead have
>> .MIPS.stubs entries for lazy relocations to external symbols.  Problem
>> is, these are only callable if the gp register is left at its
>> canonical position.  I need to load, relocate, and execute code which
>> might call these functions, which I currently redirect to the stub.
>> This means that any .got references to addresses in the code to be
>> relocated, which will of course not be in the global .got table, have
>> to be patched to immediate addressess, which on mips32 is easy
>> enough -- ld v0,oooo(gp) ->  lui v0,hhhh.  This won't work on mips64.
>>
>
> PLT support works with the n32 ABI (with new toolchains).  Can you use that?

-mabi=n32 -mplt still seems to generate a .MIPS.stubs section
 requiring canonical gp register setting (gcc 4.4.5).  Am I missing
 something? 

>
> I am missing part of the puzzle.  ld.so handles all of this, why can't
> you let it do its job?
>

The general setting is that there is a fully linked executable which
when run, has the ability to load, relocate, and execute new code in
.o files.  Furthermore, the running program can be saved to disk via
unexec and reexecuted later, possibly on a different machine. Calls in
the .o files t be loaded to symbols in shared libraries cannot be set
to the current address of the symbol, as this might not be persistent
across image saves and reexecution.  Relocating instead to a
preexisting stub in the base executable takes advantage of ld.so's
lazy relocation on first execution, and, as the target address lies in
the image itself, is persistent across image saves.


>
>> This seems to indicate to me that I will need to craft my own lazy
>> relocation stub for each call to a shared lib symbol at the end of
>> each loaded block of code.  Then I can mode the gp pointer to a local
>> .got table as well.  This is unfortunate, but can be done.  Two
>> questions remain:
>>
>> 1) Is there an alternative, e.g. some flag like -mplt to generate a
>> genuine .plt section in the base executable, or other way out?
>>
>
> You haven't specified at a high level what problem you are trying to solve.
>

1) If I am to make use of the base executable stub to say _setjmp, I
have to leave the gp pointer in its canonical position in the newly
loaded code, because the format of the .MIPS.stub (in contrast to the
.plt stub elsewere) requires this.  

2) Therefore all .got references in the newly loaded code have to
exist in the .got table of the base executable, thereby excluding
addresses in the newly loaded code.

3) On mips64, in contrast to mips32, I cannot overwrite .got
references to addresses in the newly loaded code to be immediate
address references instead, as it takes too many instructions.

4) It appears that I have three broad options:

   a) Make my own .got table at the end of the newly loaded code, and
   append with my own lazy stub when necessary.  For example, on
   alpha, we create our own .got in this manner due to the 64bit
   issue, but we don't have to make our own stub as the alpha has a
   callable .plt stub making no gp register value assumptions.

   b) Do a) above but get a working .plt with some compiler flag
   settings, obviating the need to a local stub.

   c) find some other way, perhaps with compiler flags, to eliminate
   .got references to local addresses in the newly loaded code.  In
   other words, if I could instruct gcc to write accesses to the .data
   section of the newly loaded code as a 32bit offset from the .text
   section address, instead of a .got load and offset, I'd be set.

[ e.g.

0000000000000000 <init_code>:
   0:	67bdffe0 	daddiu	sp,sp,-32  
   4:	ffbf0010 	sd	ra,16(sp)
   8:	ffbe0008 	sd	s8,8(sp)
   c:	ffbc0000 	sd	gp,0(sp)
  10:	03a0f02d 	move	s8,sp
  14:	3c1c0000 	lui	gp,0x0
  18:	0399e02d 	daddu	gp,gp,t9
  1c:	679c0000 	daddiu	gp,gp,0
  20:	df820000 	ld	v0,0(gp)    <-- data address page load, cannot be written as lui on 64bit
  24:	64420000 	daddiu	v0,v0,0     <-- data address offset
  28:	0040202d 	move	a0,v0
  2c:	df990000 	ld	t9,0(gp)
  30:	0320f809 	jalr	t9
  34:	00000000 	nop
  38:	03c0e82d 	move	sp,s8
  3c:	dfbf0010 	ld	ra,16(sp)
  40:	dfbe0008 	ld	s8,8(sp)
  44:	dfbc0000 	ld	gp,0(sp)
  48:	67bd0020 	daddiu	sp,sp,32
  4c:	03e00008 	jr	ra
  50:	00000000 	nop

]


It looks like a) is the best, though it will require mips only
modifications to the generic elf loading code, which is very
unfortunate. 


>> 2) I don't completely understand the stub:
>>
>> ->    12010e090:	df998010 	ld	t9,-32752(gp)
>>       12010e094:	03e0782d 	move	t3,ra
>>       12010e098:	0320f809 	jalr	t9
>>       12010e09c:	641807c6 	daddiu	t8,zero,1990
>> ->    12010e0a0:	df998010 	ld	t9,-32752(gp)
>>       12010e0a4:	03e0782d 	move	t3,ra
>>       12010e0a8:	0320f809 	jalr	t9
>>       12010e0ac:	641807c5 	daddiu	t8,zero,1989
>>
>> ->  denotes stub entry points.  How does the add ever get called?  This
>> add contains the only reference to the .got entry of the external
>> symbol.  It appears that it should be called before the jump.
>
> On MIPS the instruction after a branch or jump is executed as part of
> the control transfer instruction.  This called the Delay Slot.
>
> t9 is loaded with the address of the lazy resolver.  Return address
> saved into t3, symbol index loaded into t8, make the call to the lazy
> resolver via t9 ...
>

Thank you!  This was especially helpful!

Take care,

>
>>
>> Thanks so much.
>
>
>
>
>

-- 
Camm Maguire			     		    camm@maguirefamily.org
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah


Reply to: