Re: mips64 assembler

To: Camm Maguire <camm@maguirefamily.org>
Cc: debian-mips@lists.debian.org, Frederick Isaac <freddyisaac@gmail.com>, gcl-devel@gnu.org, eis@blocksatpc02.upc.es, hector.oron@gmail.com
Subject: Re: mips64 assembler
From: David Daney <ddaney@caviumnetworks.com>
Date: Wed, 22 Sep 2010 16:05:45 -0700
Message-id: <[🔎] 4C9A8BC9.1020605@caviumnetworks.com>
In-reply-to: <[🔎] 87lj6te9t1.fsf@maguirefamily.org>
References: <[🔎] E1OwbkA-0006gv-Bi@localhost.m.enhanced.com> <[🔎] 4C93993E.7030008@caviumnetworks.com> <[🔎] 8762y49k1k.fsf@maguirefamily.org> <[🔎] 4C93D86D.5090201@caviumnetworks.com> <[🔎] 87fwx4dwu5.fsf@maguirefamily.org> <[🔎] 4C97D9A1.7050102@caviumnetworks.com> <[🔎] 87lj6te9t1.fsf@maguirefamily.org>

On 09/22/2010 02:40 PM, Camm Maguire wrote:

Greetings!

David Daney<ddaney@caviumnetworks.com>  writes:

On 09/20/2010 12:44 PM, Camm Maguire wrote:

David Daney<ddaney@caviumnetworks.com>   writes:

PLT support works with the n32 ABI (with new toolchains).  Can you use that?


-mabi=n32 -mplt still seems to generate a .MIPS.stubs section
   requiring canonical gp register setting (gcc 4.4.5).  Am I missing
   something?


You may also have to specify -mno-shared.  It looks like the GCC
documentation is foobar for this option.  At some point it started
following -fPIC, but the documentation doesn't indicate this.


Still have a .MIPS.stub section and no .plt section.  Am I looking for
the wrong thing?

You have to have a compatible toolchain. That means gcc, Binutils andglibc all have to support it. MIPS PLT support was added recently, soold tools do not have the support.



$ mips64-linux-gcc --version
mips64-linux-gcc (GCC) 4.5.1
Copyright (C) 2010 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ mips64-linux-ld --version
GNU ld (GNU Binutils) 2.20
Copyright 2009 Free Software Foundation, Inc.
This program is free software; you may redistribute it under the terms of

the GNU General Public License version 3 or (at your option) a laterversion.

This program has absolutely no warranty.

$ cat hello.c
#include <stdio.h>

int j;

void hello(void)
{
  printf("Hello world %d.\n", j++);
}


int main(int argc, char *argv[])
{

  hello();

  return 0;
}
$ mips64-linux-gcc -mplt -mabi=n32 -o hello -O2 hello.c
$ readelf -e hello
ELF Header:
  Magic:   7f 45 4c 46 01 02 01 00 01 00 00 00 00 00 00 00
  Class:                             ELF32
  Data:                              2's complement, big endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       1
  Type:                              EXEC (Executable file)
  Machine:                           MIPS R3000
  Version:                           0x1
  Entry point address:               0x10000510
  Start of program headers:          52 (bytes into file)
  Start of section headers:          5252 (bytes into file)

Flags: 0x808b0025, noreorder, cpic, abi2,octeon, mips64r2

  Size of this header:               52 (bytes)
  Size of program headers:           32 (bytes)
  Number of program headers:         8
  Size of section headers:           40 (bytes)
  Number of section headers:         42
  Section header string table index: 39

Section Headers:

[Nr] Name Type Addr Off Size ES FlgLk Inf Al[ 0] NULL 00000000 000000 000000 000 0 0[ 1] .interp PROGBITS 10000134 000134 00000f 00 A0 0 1[ 2] .note.ABI-tag NOTE 10000144 000144 000020 00 A0 0 4[ 3] .reginfo MIPS_REGINFO 10000168 000168 000018 18 A0 0 8[ 4] .dynamic DYNAMIC 10000180 000180 0000f8 08 A7 0 4[ 5] .hash HASH 10000278 000278 000044 04 A6 0 4[ 6] .dynsym DYNSYM 100002bc 0002bc 0000c0 10 A7 1 4[ 7] .dynstr STRTAB 1000037c 00037c 00009b 00 A0 0 1[ 8] .gnu.version VERSYM 10000418 000418 000018 02 A6 0 2[ 9] .gnu.version_r VERNEED 10000430 000430 000020 00 A7 1 4[10] .rel.plt REL 10000450 000450 000008 08 A6 12 4[11] .init PROGBITS 10000458 000458 000078 00 AX0 0 8[12] .plt PROGBITS 100004e0 0004e0 000030 00 AX0 0 32[13] .text PROGBITS 10000510 000510 0002c0 00 AX0 0 16[14] .MIPS.stubs PROGBITS 100007d0 0007d0 000020 00 AX0 0 4[15] .fini PROGBITS 100007f0 0007f0 000048 00 AX0 0 8[16] .rodata PROGBITS 10000838 000838 000020 00 A0 0 8[17] .eh_frame PROGBITS 10000858 000858 000004 00 A0 0 4[18] .ctors PROGBITS 1001085c 00085c 000008 00 WA0 0 4[19] .dtors PROGBITS 10010864 000864 000008 00 WA0 0 4[20] .jcr PROGBITS 1001086c 00086c 000004 00 WA0 0 4[21] .data PROGBITS 10010870 000870 000010 00 WA0 0 16[22] .rld_map PROGBITS 10010880 000880 000004 00 WA0 0 4[23] .got.plt PROGBITS 10010884 000884 00000c 00 WA0 0 4[24] .got PROGBITS 10010890 000890 00003c 04 WAp0 0 16[25] .sdata PROGBITS 100108cc 0008cc 000004 00 WAp0 0 4[26] .sbss NOBITS 100108d0 0008d0 000004 00 WAp0 0 4[27] .bss NOBITS 100108e0 0008d0 000010 00 WA0 0 16[28] .pdr PROGBITS 00000000 0008d0 000080 000 0 4[29] .comment PROGBITS 00000000 000950 000105 000 0 1[30] .debug_aranges MIPS_DWARF 00000000 000a58 000078 000 0 8[31] .debug_pubnames MIPS_DWARF 00000000 000ad0 00005f 000 0 1[32] .debug_info MIPS_DWARF 00000000 000b2f 000258 000 0 1[33] .debug_abbrev MIPS_DWARF 00000000 000d87 000139 000 0 1[34] .debug_line MIPS_DWARF 00000000 000ec0 000233 000 0 1[35] .debug_frame MIPS_DWARF 00000000 0010f4 000048 000 0 4[36] .debug_str MIPS_DWARF 00000000 00113c 000132 01 MS0 0 1[37] .debug_loc MIPS_DWARF 00000000 00126e 00008d 000 0 1[38] .gnu.attributes LOOS+ffffff5 00000000 0012fb 000010 000 0 1[39] .shstrtab STRTAB 00000000 00130b 000176 000 0 1[40] .symtab SYMTAB 00000000 001b14 0005f0 1041 67 4[41] .strtab STRTAB 00000000 002104 000322 000 0 1

Key to Flags:
  W (write), A (alloc), X (execute), M (merge), S (strings)
  I (info), L (link order), G (group), x (unknown)
  O (extra OS processing required) o (OS specific), p (processor specific)

Program Headers:
  Type           Offset   VirtAddr   PhysAddr   FileSiz MemSiz  Flg Align
  PHDR           0x000034 0x10000034 0x10000034 0x00100 0x00100 R E 0x4
  INTERP         0x000134 0x10000134 0x10000134 0x0000f 0x0000f R   0x1
      [Requesting program interpreter: /lib32/ld.so.1]
  REGINFO        0x000168 0x10000168 0x10000168 0x00018 0x00018 R   0x8
  LOAD           0x000000 0x10000000 0x10000000 0x0085c 0x0085c R E 0x10000
  LOAD           0x00085c 0x1001085c 0x1001085c 0x00074 0x00094 RW  0x10000
  DYNAMIC        0x000180 0x10000180 0x10000180 0x000f8 0x000f8 RWE 0x4
  NOTE           0x000144 0x10000144 0x10000144 0x00020 0x00020 R   0x4
  NULL           0x000000 0x00000000 0x00000000 0x00000 0x00000     0x4

 Section to Segment mapping:
  Segment Sections...
   00
   01     .interp
   02     .reginfo

03 .interp .note.ABI-tag .reginfo .dynamic .hash .dynsym .dynstr.gnu.version .gnu.version_r .rel.plt .init .plt .text .MIPS.stubs .fini.rodata .eh_frame04 .ctors .dtors .jcr .data .rld_map .got.plt .got .sdata .sbss.bss

   05     .dynamic
   06     .note.ABI-tag
   07


Look there, the PLT is section 17.


I am missing part of the puzzle.  ld.so handles all of this, why can't
you let it do its job?


The general setting is that there is a fully linked executable which
when run, has the ability to load, relocate, and execute new code in
.o files.


dlopen() works.  Why can't you use it?


1) Thousands of loads typically occur in a given session, and dlopen
consumes one file descriptor for each loaded file.

2) I cannot specify the destination address for the located code, so
that it is very difficult or perhaps impossible to preserve these
loads across unexec.

3) The loaded files typically aren't kept with the saved binary as it
is moved among machines.

cvs gcl has a mechanism to preserve calls to symbols like 'sin' found
in libm using dlopen.  This is done by making a call through a C
pointer which is reset at startup time.  But here, there are a very
limited number of libraries opened, the libraries are ubiquitous, the
the code called does not need to be saved with unexec.

Furthermore, the running program can be saved to disk via
unexec and reexecuted later, possibly on a different machine. Calls in
the .o files t be loaded to symbols in shared libraries cannot be set
to the current address of the symbol, as this might not be persistent
across image saves and reexecution.  Relocating instead to a
preexisting stub in the base executable takes advantage of ld.so's
lazy relocation on first execution, and, as the target address lies in
the image itself, is persistent across image saves.


unexec is very tricky indeed.  I haven't tried to build an n32 version
of emacs.  I should try it.  The last time I looked emacs used unexec.


This is working on mips32 for gcl/acl2/maxima/hol88/axiom.

This seems to indicate to me that I will need to craft my own lazy
relocation stub for each call to a shared lib symbol at the end of
each loaded block of code.  Then I can mode the gp pointer to a local
.got table as well.  This is unfortunate, but can be done.  Two
questions remain:

1) Is there an alternative, e.g. some flag like -mplt to generate a
genuine .plt section in the base executable, or other way out?


You haven't specified at a high level what problem you are trying to solve.


1) If I am to make use of the base executable stub to say _setjmp, I
have to leave the gp pointer in its canonical position in the newly
loaded code, because the format of the .MIPS.stub (in contrast to the
.plt stub elsewere) requires this.

2) Therefore all .got references in the newly loaded code have to
exist in the .got table of the base executable, thereby excluding
addresses in the newly loaded code.


This I don't understand.  Each function conceptually has its own GOT
although in practice many of them are merged together.  So in a
running program there will be several GOTs  (a minimum of one for the
executable and one for each shared library loaded)  The function
prolog loads the gp if it will use it.  The use of -mplt may slightly
change the mechanism (I haven't looked at it for quite a while), but
really I think the notion of a canonical gp


Were you trying to finish a sentence here?  (I'd love to know all your
thoughts on this matter!).  I might get your gist (see below).


Yes.

... I think the notion of a canonical gp does not exist for MIPS. Thenotion of gp can vary from instruction to instruction (but in practiceonly changes at function boundries).

3) On mips64, in contrast to mips32, I cannot overwrite .got
references to addresses in the newly loaded code to be immediate
address references instead, as it takes too many instructions.


The GOT is just a bunch of pointers.  If you can overwrite them in the
o32 ABI, I don't understand why you cannot do the same for n32/n64.


I meant overwrite register loads from the got with register loads from
an immediate value.

Also if you run with LD_BIND_NOW the lazy binding stubs are never
used, the GOT will be fully populated by ld.so when the program
starts.


This actually is a very useful piece of info -- thanks!

This obviously frees me from having to worry about the stub, but I'm
not sure if it allows me to escape from using the .got for the base
executable.  This .got is guaranteed to be handled by ld.so on
startup, either immediately or lazily.  Any .got I craft and append to
my loaded code will not, unless I can point the executable header to
this region somehow.

For example, say I load code that calls _setjmp.  If I use the
existing .got, even if populated immediately, I know that if my code
is dumped, and the binary moved to another machine, and restarted, the
new _setjmp address will be handled properly.

The jmp_buf has to contain the necessary state so that it can resumewhen you call longjmp. Some of the state (like the gp) may beregenerated by magic code emitted by GCC at the jump target. But thisis no different than for other architectures. So you cannot relocatethe GOT after an unexec it would probably have to remain in the samelocation.

I would however note that their may be GCC bugs related tosetjmp/longjmp and nested functions for the n32/n64 ABIs. I couldn'tfind any bugzilla entries for this though.

4) It appears that I have three broad options:

     a) Make my own .got table at the end of the newly loaded code, and
     append with my own lazy stub when necessary.  For example, on
     alpha, we create our own .got in this manner due to the 64bit
     issue, but we don't have to make our own stub as the alpha has a
     callable .plt stub making no gp register value assumptions.

     b) Do a) above but get a working .plt with some compiler flag
     settings, obviating the need to a local stub.


     c) find some other way, perhaps with compiler flags, to eliminate
     .got references to local addresses in the newly loaded code.  In
     other words, if I could instruct gcc to write accesses to the .data
     section of the newly loaded code as a 32bit offset from the .text
     section address, instead of a .got load and offset, I'd be set.



Not possible.  There is no pc relative addressing mode.


Good to know, thanks!


[ e.g.

0000000000000000<init_code>:
     0:	67bdffe0 	daddiu	sp,sp,-32
     4:	ffbf0010 	sd	ra,16(sp)
     8:	ffbe0008 	sd	s8,8(sp)
     c:	ffbc0000 	sd	gp,0(sp)
    10:	03a0f02d 	move	s8,sp
    14:	3c1c0000 	lui	gp,0x0
    18:	0399e02d 	daddu	gp,gp,t9
    1c:	679c0000 	daddiu	gp,gp,0
    20:	df820000 	ld	v0,0(gp)<-- data address page load, cannot be written as lui on 64bit


No it cannot, but why can't you populate the GOT/PLT with the address
as the standard ABIs do?  I know I have asked this in several
different forms, so please be patient...


So far, gp is at its canonical value.  This lets me use the _setjmp
entry of the base executable handled transparently by ld.so.  The
address I need is in the code to be loaded (the top of its .data
section).  It obviously cannot be in the .got of the base executable,
as it is *new* code.

Each function has its own GOT. Each got is at a fixed offset from thefunction entrypoint. This offset is calculated by the linker. That ishow MIPS position independent code works. You can move a functionaround, but the GOT moves with it always with the same offset. Theaddress of the function entry point is always passed in register t9($25) so that the function can calculate the gp in two instructions.


On other machines with a .plt (e.g. alpha), I don't leave the gp at
its 'canonical' value, but rather set it to a mini-table I craft at
the end of the code to be loaded.  I then populate this .got
accordingly.  The _setjmp address I use is the address of the .plt
entry.  This will always call the .plt entry and never reset the new
.got slot, as the .plt is designed to set the .got slot of the base
executable.  So the call is somewhat inefficient, but it works and is
stable.   On mips, if I move the gp pointer to my mini-table, it will
no longer be correct in the stub where it is used to lookup the lazy
relocator of libdl in the .got of the base executable.

The entry point of the lazy resolver is always at a fixed location inthe GOT. So if you are creating a GOT, just make sure you reserve theslots that are used by ld.so.

The stubs are a little unique in this manner. They rely on a valid gpat entry. All other functions calculate their own gp and don't careabout the value of gp passed on entry.

So in sum, it seems that if I can get a .plt, all I need is a local
.got.  Otherwise, I need a local .got, plus a stub for each call to an
external symbol like _setjmp, plus some means of resetting the new
.got entry to the lazy relocator at each image execution.  Right?

Actually, one better idea has just come to mind.  The new _setjmp stub
should rather reload the old (canonical) gp pointer, then do a .got
call to the _setjmp entry in the .got table of the base executable.
This is cumbersome, but at least then I don't have to mess with issues
regarding image startup and ld.so.  This is akin of course to making
my own .plt for the symbol.

Thoughts?

Take care,

Reply to:

Follow-Ups:
- Re: mips64 assembler
  - From: David Daney <ddaney@caviumnetworks.com>
- Re: mips64 assembler
  - From: Camm Maguire <camm@maguirefamily.org>
- Re: mips64 assembler
  - From: Camm Maguire <camm@maguirefamily.org>

References:
- mips64 assembler
  - From: Camm Maguire <camm@maguirefamily.org>
- Re: mips64 assembler
  - From: David Daney <ddaney@caviumnetworks.com>
- Re: mips64 assembler
  - From: Camm Maguire <camm@maguirefamily.org>
- Re: mips64 assembler
  - From: David Daney <ddaney@caviumnetworks.com>
- Re: mips64 assembler
  - From: Camm Maguire <camm@maguirefamily.org>
- Re: mips64 assembler
  - From: David Daney <ddaney@caviumnetworks.com>
- Re: mips64 assembler
  - From: Camm Maguire <camm@maguirefamily.org>

Prev by Date: Re: mips64 assembler
Next by Date: Re: mips64 assembler
Previous by thread: Re: mips64 assembler
Next by thread: Re: mips64 assembler
Index(es):
- Date
- Thread