[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: mips64 assembler

On 09/20/2010 12:44 PM, Camm Maguire wrote:
David Daney<ddaney@caviumnetworks.com>  writes:

On 09/17/2010 01:44 PM, Camm Maguire wrote:

David Daney<ddaney@caviumnetworks.com>   writes:

On 09/17/2010 07:16 AM, Camm Maguire wrote:
Greetings!  Is there anyway to load a known 64bit number into a given
register in two instructions?

Not in the general case where the value of the 64-bit number is

Said number is guaranteed to be within
32bits of the current value of another register.

In other words, you want to add an arbitrary 32-bit constant to the
value in a register.  You would need three instructions to do this.
Two to generate the 32-bit constant and another to do the addition.

David Daney.

Alas, this was as I had expected.  Perhaps you can suggest a course of

On mips only, there is no plt support -- executables instead have
.MIPS.stubs entries for lazy relocations to external symbols.  Problem
is, these are only callable if the gp register is left at its
canonical position.  I need to load, relocate, and execute code which
might call these functions, which I currently redirect to the stub.
This means that any .got references to addresses in the code to be
relocated, which will of course not be in the global .got table, have
to be patched to immediate addressess, which on mips32 is easy
enough -- ld v0,oooo(gp) ->   lui v0,hhhh.  This won't work on mips64.

PLT support works with the n32 ABI (with new toolchains).  Can you use that?

-mabi=n32 -mplt still seems to generate a .MIPS.stubs section
  requiring canonical gp register setting (gcc 4.4.5).  Am I missing

You may also have to specify -mno-shared. It looks like the GCC documentation is foobar for this option. At some point it started following -fPIC, but the documentation doesn't indicate this.

I am missing part of the puzzle.  ld.so handles all of this, why can't
you let it do its job?

The general setting is that there is a fully linked executable which
when run, has the ability to load, relocate, and execute new code in
.o files.

dlopen() works.  Why can't you use it?

Furthermore, the running program can be saved to disk via
unexec and reexecuted later, possibly on a different machine. Calls in
the .o files t be loaded to symbols in shared libraries cannot be set
to the current address of the symbol, as this might not be persistent
across image saves and reexecution.  Relocating instead to a
preexisting stub in the base executable takes advantage of ld.so's
lazy relocation on first execution, and, as the target address lies in
the image itself, is persistent across image saves.

unexec is very tricky indeed. I haven't tried to build an n32 version of emacs. I should try it. The last time I looked emacs used unexec.

This seems to indicate to me that I will need to craft my own lazy
relocation stub for each call to a shared lib symbol at the end of
each loaded block of code.  Then I can mode the gp pointer to a local
.got table as well.  This is unfortunate, but can be done.  Two
questions remain:

1) Is there an alternative, e.g. some flag like -mplt to generate a
genuine .plt section in the base executable, or other way out?

You haven't specified at a high level what problem you are trying to solve.

1) If I am to make use of the base executable stub to say _setjmp, I
have to leave the gp pointer in its canonical position in the newly
loaded code, because the format of the .MIPS.stub (in contrast to the
.plt stub elsewere) requires this.

2) Therefore all .got references in the newly loaded code have to
exist in the .got table of the base executable, thereby excluding
addresses in the newly loaded code.

This I don't understand. Each function conceptually has its own GOT although in practice many of them are merged together. So in a running program there will be several GOTs (a minimum of one for the executable and one for each shared library loaded) The function prolog loads the gp if it will use it. The use of -mplt may slightly change the mechanism (I haven't looked at it for quite a while), but really I think the notion of a canonical gp

3) On mips64, in contrast to mips32, I cannot overwrite .got
references to addresses in the newly loaded code to be immediate
address references instead, as it takes too many instructions.

The GOT is just a bunch of pointers. If you can overwrite them in the o32 ABI, I don't understand why you cannot do the same for n32/n64.

Also if you run with LD_BIND_NOW the lazy binding stubs are never used, the GOT will be fully populated by ld.so when the program starts.

4) It appears that I have three broad options:

    a) Make my own .got table at the end of the newly loaded code, and
    append with my own lazy stub when necessary.  For example, on
    alpha, we create our own .got in this manner due to the 64bit
    issue, but we don't have to make our own stub as the alpha has a
    callable .plt stub making no gp register value assumptions.

    b) Do a) above but get a working .plt with some compiler flag
    settings, obviating the need to a local stub.

    c) find some other way, perhaps with compiler flags, to eliminate
    .got references to local addresses in the newly loaded code.  In
    other words, if I could instruct gcc to write accesses to the .data
    section of the newly loaded code as a 32bit offset from the .text
    section address, instead of a .got load and offset, I'd be set.

Not possible.  There is no pc relative addressing mode.

[ e.g.

    0:	67bdffe0 	daddiu	sp,sp,-32
    4:	ffbf0010 	sd	ra,16(sp)
    8:	ffbe0008 	sd	s8,8(sp)
    c:	ffbc0000 	sd	gp,0(sp)
   10:	03a0f02d 	move	s8,sp
   14:	3c1c0000 	lui	gp,0x0
   18:	0399e02d 	daddu	gp,gp,t9
   1c:	679c0000 	daddiu	gp,gp,0
   20:	df820000 	ld	v0,0(gp)<-- data address page load, cannot be written as lui on 64bit

No it cannot, but why can't you populate the GOT/PLT with the address as the standard ABIs do? I know I have asked this in several different forms, so please be patient...

   24:	64420000 	daddiu	v0,v0,0<-- data address offset
   28:	0040202d 	move	a0,v0
   2c:	df990000 	ld	t9,0(gp)
   30:	0320f809 	jalr	t9
   34:	00000000 	nop
   38:	03c0e82d 	move	sp,s8
   3c:	dfbf0010 	ld	ra,16(sp)
   40:	dfbe0008 	ld	s8,8(sp)
   44:	dfbc0000 	ld	gp,0(sp)
   48:	67bd0020 	daddiu	sp,sp,32
   4c:	03e00008 	jr	ra
   50:	00000000 	nop


It looks like a) is the best, though it will require mips only
modifications to the generic elf loading code, which is very

2) I don't completely understand the stub:

->     12010e090:	df998010 	ld	t9,-32752(gp)
       12010e094:	03e0782d 	move	t3,ra
       12010e098:	0320f809 	jalr	t9
       12010e09c:	641807c6 	daddiu	t8,zero,1990
->     12010e0a0:	df998010 	ld	t9,-32752(gp)
       12010e0a4:	03e0782d 	move	t3,ra
       12010e0a8:	0320f809 	jalr	t9
       12010e0ac:	641807c5 	daddiu	t8,zero,1989

->   denotes stub entry points.  How does the add ever get called?  This
add contains the only reference to the .got entry of the external
symbol.  It appears that it should be called before the jump.

On MIPS the instruction after a branch or jump is executed as part of
the control transfer instruction.  This called the Delay Slot.

t9 is loaded with the address of the lazy resolver.  Return address
saved into t3, symbol index loaded into t8, make the call to the lazy
resolver via t9 ...

Thank you!  This was especially helpful!

Take care,

Thanks so much.

Reply to: