Re: mips64 assembler
On 09/20/2010 12:44 PM, Camm Maguire wrote:
David Daney<ddaney@caviumnetworks.com> writes:
On 09/17/2010 01:44 PM, Camm Maguire wrote:
Greetings!
David Daney<ddaney@caviumnetworks.com> writes:
On 09/17/2010 07:16 AM, Camm Maguire wrote:
Greetings! Is there anyway to load a known 64bit number into a given
register in two instructions?
Not in the general case where the value of the 64-bit number is
unconstrained...
Said number is guaranteed to be within
32bits of the current value of another register.
In other words, you want to add an arbitrary 32-bit constant to the
value in a register. You would need three instructions to do this.
Two to generate the 32-bit constant and another to do the addition.
David Daney.
Alas, this was as I had expected. Perhaps you can suggest a course of
action.
On mips only, there is no plt support -- executables instead have
.MIPS.stubs entries for lazy relocations to external symbols. Problem
is, these are only callable if the gp register is left at its
canonical position. I need to load, relocate, and execute code which
might call these functions, which I currently redirect to the stub.
This means that any .got references to addresses in the code to be
relocated, which will of course not be in the global .got table, have
to be patched to immediate addressess, which on mips32 is easy
enough -- ld v0,oooo(gp) -> lui v0,hhhh. This won't work on mips64.
PLT support works with the n32 ABI (with new toolchains). Can you use that?
-mabi=n32 -mplt still seems to generate a .MIPS.stubs section
requiring canonical gp register setting (gcc 4.4.5). Am I missing
something?
You may also have to specify -mno-shared. It looks like the GCC
documentation is foobar for this option. At some point it started
following -fPIC, but the documentation doesn't indicate this.
I am missing part of the puzzle. ld.so handles all of this, why can't
you let it do its job?
The general setting is that there is a fully linked executable which
when run, has the ability to load, relocate, and execute new code in
.o files.
dlopen() works. Why can't you use it?
Furthermore, the running program can be saved to disk via
unexec and reexecuted later, possibly on a different machine. Calls in
the .o files t be loaded to symbols in shared libraries cannot be set
to the current address of the symbol, as this might not be persistent
across image saves and reexecution. Relocating instead to a
preexisting stub in the base executable takes advantage of ld.so's
lazy relocation on first execution, and, as the target address lies in
the image itself, is persistent across image saves.
unexec is very tricky indeed. I haven't tried to build an n32 version
of emacs. I should try it. The last time I looked emacs used unexec.
This seems to indicate to me that I will need to craft my own lazy
relocation stub for each call to a shared lib symbol at the end of
each loaded block of code. Then I can mode the gp pointer to a local
.got table as well. This is unfortunate, but can be done. Two
questions remain:
1) Is there an alternative, e.g. some flag like -mplt to generate a
genuine .plt section in the base executable, or other way out?
You haven't specified at a high level what problem you are trying to solve.
1) If I am to make use of the base executable stub to say _setjmp, I
have to leave the gp pointer in its canonical position in the newly
loaded code, because the format of the .MIPS.stub (in contrast to the
.plt stub elsewere) requires this.
2) Therefore all .got references in the newly loaded code have to
exist in the .got table of the base executable, thereby excluding
addresses in the newly loaded code.
This I don't understand. Each function conceptually has its own GOT
although in practice many of them are merged together. So in a running
program there will be several GOTs (a minimum of one for the executable
and one for each shared library loaded) The function prolog loads the
gp if it will use it. The use of -mplt may slightly change the
mechanism (I haven't looked at it for quite a while), but really I think
the notion of a canonical gp
3) On mips64, in contrast to mips32, I cannot overwrite .got
references to addresses in the newly loaded code to be immediate
address references instead, as it takes too many instructions.
The GOT is just a bunch of pointers. If you can overwrite them in the
o32 ABI, I don't understand why you cannot do the same for n32/n64.
Also if you run with LD_BIND_NOW the lazy binding stubs are never used,
the GOT will be fully populated by ld.so when the program starts.
4) It appears that I have three broad options:
a) Make my own .got table at the end of the newly loaded code, and
append with my own lazy stub when necessary. For example, on
alpha, we create our own .got in this manner due to the 64bit
issue, but we don't have to make our own stub as the alpha has a
callable .plt stub making no gp register value assumptions.
b) Do a) above but get a working .plt with some compiler flag
settings, obviating the need to a local stub.
c) find some other way, perhaps with compiler flags, to eliminate
.got references to local addresses in the newly loaded code. In
other words, if I could instruct gcc to write accesses to the .data
section of the newly loaded code as a 32bit offset from the .text
section address, instead of a .got load and offset, I'd be set.
Not possible. There is no pc relative addressing mode.
[ e.g.
0000000000000000<init_code>:
0: 67bdffe0 daddiu sp,sp,-32
4: ffbf0010 sd ra,16(sp)
8: ffbe0008 sd s8,8(sp)
c: ffbc0000 sd gp,0(sp)
10: 03a0f02d move s8,sp
14: 3c1c0000 lui gp,0x0
18: 0399e02d daddu gp,gp,t9
1c: 679c0000 daddiu gp,gp,0
20: df820000 ld v0,0(gp)<-- data address page load, cannot be written as lui on 64bit
No it cannot, but why can't you populate the GOT/PLT with the address as
the standard ABIs do? I know I have asked this in several different
forms, so please be patient...
24: 64420000 daddiu v0,v0,0<-- data address offset
28: 0040202d move a0,v0
2c: df990000 ld t9,0(gp)
30: 0320f809 jalr t9
34: 00000000 nop
38: 03c0e82d move sp,s8
3c: dfbf0010 ld ra,16(sp)
40: dfbe0008 ld s8,8(sp)
44: dfbc0000 ld gp,0(sp)
48: 67bd0020 daddiu sp,sp,32
4c: 03e00008 jr ra
50: 00000000 nop
]
gpr-names=
It looks like a) is the best, though it will require mips only
modifications to the generic elf loading code, which is very
unfortunate.
2) I don't completely understand the stub:
-> 12010e090: df998010 ld t9,-32752(gp)
12010e094: 03e0782d move t3,ra
12010e098: 0320f809 jalr t9
12010e09c: 641807c6 daddiu t8,zero,1990
-> 12010e0a0: df998010 ld t9,-32752(gp)
12010e0a4: 03e0782d move t3,ra
12010e0a8: 0320f809 jalr t9
12010e0ac: 641807c5 daddiu t8,zero,1989
-> denotes stub entry points. How does the add ever get called? This
add contains the only reference to the .got entry of the external
symbol. It appears that it should be called before the jump.
On MIPS the instruction after a branch or jump is executed as part of
the control transfer instruction. This called the Delay Slot.
t9 is loaded with the address of the lazy resolver. Return address
saved into t3, symbol index loaded into t8, make the call to the lazy
resolver via t9 ...
Thank you! This was especially helpful!
Take care,
Thanks so much.
Reply to: