Re: Draft TLS/NPTL ABI for m68k and ColdFire, version 0.2
On Fri, 30 Nov 2007, Joseph S. Myers wrote:
> Kernel helpers
> This TLS ABI defines a function __m68k_read_tp, provided by libc.
> This returns the thread pointer in register a0 (not d0) and may
> clobber other call-clobbered registers. The compiler will generate
> calls to this function for the initial exec and local exec models.
> To implement this function and other requirements for NPTL, four
> kernel helpers are to be provided in a vDSO (as provided by the kernel
> on Power and other architectures). The symbols indicated are exported
> at symbol version LINUX_2.6. Full DWARF unwind information for all
> these functions must be included in the vDSO, as thread cancellation
> may need to unwind from any point in any of these functions. The
> kernel informs glibc of the location of the vDSO by putting an
> AT_SYSINFO_EHDR entry in the auxiliary vector passed to each process.
> If glibc is configured for a subset of processors where the necessary
> operations do not require a kernel helper, then it does not need to
> use the kernel helper (for example, glibc configured only for m68k
> processors with a cas instruction does not need to use the
> compare-and-exchange helper), but the kernel must provide all these
> helpers on all m68k and ColdFire processors so that
> lowest-common-denominator glibc binaries can work across all
> The helper __kernel_read_tp returns the thread pointer in register a0
> (not d0) and may clobber other call-clobbered registers. (Because it
> is only called from __m68k_read_tp, which is called through the PLT,
> and the resolver may clobber call-clobbered registers, there seems to
> be no advantage in restricting clobbers from this helper.)
Why is there a need for separate __kernel_read_tp/__m68k_read_tp? Wouldn't
this add one unneccessary indirection? Couldn't one of them just be an
alias for the other?
Personally I'd call them ..._get_tp/_set_tp (i.e. closer to what ARM is
> The helper __kernel_write_tp sets the thread pointer to the value in
> a0. It does not clobber any registers other than the condition codes.
This function is not really critical, so I'd keep clobber rules in line
> Offset length issues
> On ColdFire (and m68k before 68020), only 16-bit offsets can be used
> in memory addresses. On m68k (68020 and later), 32-bit offsets can be
> used; a ".w" assembly suffix is used for 16-bit offsets, and otherwise
> the offsets are 32 bits.
> The use of 16-bit offsets limits GOT size to 8192 entries (the
> toolchain does not use negative GOT offsets on m68k/ColdFire). On
> m68k (68020 and later), GCC uses 32-offsets with -fPIC and 16-bit
> offsets with -fpic (and does not need to use GOT accesses for non-PIC
> code at present).
> The proposals here do not address GOT size limitations, although an
> example is given to illustrate a possible longer access sequence to
> avoid those limitations on ColdFire. The examples using offsets such
> as #x@TLSGD in GOT accesses are shown for ColdFire and use the 16-bit
> relocations shown. For m68k (68020 and later), either the syntax
> shown may be used, with a 32-bit relocation, or a ".w" suffix may be
> used, with a 16-bit relocation. It is proposed that the compiler, on
> m68k (68020 and later), will use ".w" for -fpic and the 32-bit offsets
> otherwise. (No specific option is proposed to choose between 16-bit
> and 32-bit offsets for the non-PIC, initial exec case, though such an
> option could be added later.)
> The same issue as for GOT accesses also applies to accesses to TLS
> data using the local dynamic and local exec models. The example code
> sequences determine the address of the variable, but typically it will
> be desired to read or write the variable and this may be done more
> efficiently using offset addressing. It is proposed that by default
> the compiler will require the relevant TLS area to be accessible using
> 16-bit offsets, and that an option -mxtls must be used when compiling
> objects that use the local dynamic or local exec models and will be
> linked into a module with too large a TLS area for 16-bit offset
Trying to use 16bit offset has advantages for m68k too, as the extra 16bit
makes the instruction by 32bit larger.
However I don't have a good feeling at forcing a specific model at the
ABI level, I'd rather leave the default to the system environment and
create two options to specifically select the model (e.g. FRV has
Otherwise the rest looks good, details probably have to be dealt with
during implementation anyway.
I've already played with a vdso implementation and played with a few
possibilities, there are subtle problems when writing to that page (e.g.
by the debugger via ptrace), so that at the next context switch the
correct thread value is written to the correct page...