[PATCH,RFC] nwfpe: fix issues related to big-endian
Hi,
Attached is a patch to fix and reenable extended precision support for
the in-kernel nwfpe floating point emulator on big-endian ARM platforms,
and is a first step towards unfscking nwfpe for big-endian.
As a reminder: nwfpe uses the FPA floating point format, which is a way
of representing IEEE 754 single-precision, double-precision and extended
double-precision numbers as arrays of one, two or three 32 bit words.
The FPA format uses native endian byte order but big-endian word order,
while nwfpe internally uses fully native 'long long' byte order,
resulting in a confusing mix of floating point formats:
40 84 d0 00 00 00 00 00 666.0 in IEEE 754 double precision
40 84 d0 00 00 00 00 00 666.0 in big-endian FPA byte order
00 d0 84 40 00 00 00 00 666.0 in little-endian FPA byte order
40 84 d0 00 00 00 00 00 666.0 in big-endian nwfpe internal byte order
00 00 00 00 00 d0 84 40 666.0 in little-endian nwfpe internal byte order
Note that to convert from FPA to nwfpe byte order and vice versa:
- on little-endian ARM, you have to swap the two halves of the double.
- on big-endian ARM, you don't have to do anything as the formats are the same.
There are a couple more (ugly) issues remaining:
1) The extended precision format that nwfpe uses differs from the format
that is described in the FPA spec. In the spec, the sign bit is in bit
31 of the first word, while in nwfpe, the sign bit is in bit 15 of the
first word. What does actual FPA hardware do here? Maybe the spec is
just wrong?
2) Ralph Siemsen told me a while ago that, contrary to what the FPA spec
says, there are programs that depend on the exact format that is used
by the LFM/SFM instructions.
3) The GETFPREGS ptrace call dumps the internal nwfpe state buffer (eeew)
to userspace (i.e. in 'fully native long long byte order'), instead of
the FPA format or something else more sensible.
This causes many userland applications that have a need to mess with
this data (for example, gdb) to just blindly swap the upper and lower
words of the data returned from the kernel, assuming that that is the
right way to convert from nwfpe byte order to FPA byte order. As per
the above, this method of converting doubles is broken on big-endian.
[Also, there is another floating point emulator (fastfpe) in the kernel,
which uses a different internal state buffer format, so apart from the
fact that our GETFPREGS buffer format is nonsensical, it's not even
consistently so.]
[Another bug: ptrace_getfpregs in arch/arm/kernel/ptrace.c copies a
struct fp_state to userland, which is 35 words big if iWMMXt is not
compiled in and 39 words big if it is, using a sizeof(struct user_fp),
which is 29 words big.]
4) The ARM ELF core dump format uses yet another definition of the
floating point word format using bitfields (struct user_fp), which
isn't compatible with any of the other formats, but when a core dump
is made, simply copies the same nwfpe local state buffer into the
core file (arch/arm/kernel/process.s:dump_fpu).
What to do:
1) Someone with actual FPA hardware should test this out, or someone
more knowledgable about the FPA spec than me should throw in his 2ct.
2) Anyone have more info on this? (Or maybe he just meant GETFPREGS?)
3) Tricky. We can't really change the little-endian ARM behaviour
anymore since this is a user-visible ABI. The problem with deciding
how to make GETFPREGS behave on big-endian is that there isn't really
any kind of definition of the structure format. There are two ways we
can define this structure, which are each compatible with how things
are currently done on little-endian:
1) Define GETFPREGS as storing doubles in 'reverse byte order'
(i.e. from LSB to MSB.)
2) Define GETFPREGS as storing doubles in 'native byte order but
little-endian word order' (i.e. little-endian byte order
and little-endian word order on little-endian systems,
and big-endian byte order and little-endian word order
on big-endian systems.)
Option 1) would make the big-endian format byte-wise compatible with
little-endian, but would require all userspace applications to check
if defined(__ARMEB__) and to conditionally byteswap each word (and swap
the two words) to convert back to native (FPA) word order if that is
the case. Because the float format in core files isn't byte-wise
identical anyway, I don't see much value in this option.
For applications that swap the two sub-words to convert from kernel
('nwfpe') order to native order, option 2) would have the advantage
of not requiring any additional userspace modifications to make those
apps work on big-endian.
4) Haven't looked into this too closely yet.
Any ideas?
cheers,
Lennert
diff -urN linux-2.6.14.commit/arch/arm/Kconfig linux-2.6.14.snap/arch/arm/Kconfig
--- linux-2.6.14.commit/arch/arm/Kconfig 2005-11-06 17:00:50.000000000 +0100
+++ linux-2.6.14.snap/arch/arm/Kconfig 2005-11-06 17:00:31.000000000 +0100
@@ -568,7 +568,7 @@
config FPE_NWFPE_XP
bool "Support extended precision"
- depends on FPE_NWFPE && !CPU_BIG_ENDIAN
+ depends on FPE_NWFPE
help
Say Y to include 80-bit support in the kernel floating-point
emulator. Otherwise, only 32 and 64-bit support is compiled in.
diff -urN linux-2.6.14.commit/arch/arm/nwfpe/fpa11_cpdt.c linux-2.6.14.snap/arch/arm/nwfpe/fpa11_cpdt.c
--- linux-2.6.14.commit/arch/arm/nwfpe/fpa11_cpdt.c 2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.14.snap/arch/arm/nwfpe/fpa11_cpdt.c 2005-11-06 12:24:34.000000000 +0100
@@ -59,8 +59,13 @@
p = (unsigned int *) &fpa11->fpreg[Fn].fExtended;
fpa11->fType[Fn] = typeExtended;
get_user(p[0], &pMem[0]); /* sign & exponent */
+#ifdef __ARMEB__
+ get_user(p[1], &pMem[1]); /* ms bits */
+ get_user(p[2], &pMem[2]); /* ls bits */
+#else
get_user(p[1], &pMem[2]); /* ls bits */
get_user(p[2], &pMem[1]); /* ms bits */
+#endif
}
#endif
@@ -78,6 +83,7 @@
case typeSingle:
case typeDouble:
{
+ /* @@@ big-endian */
get_user(p[0], &pMem[2]); /* Single */
get_user(p[1], &pMem[1]); /* double msw */
p[2] = 0; /* empty */
@@ -177,8 +183,13 @@
}
put_user(val.i[0], &pMem[0]); /* sign & exp */
+#ifdef __ARMEB__
+ put_user(val.i[1], &pMem[1]); /* msw */
+ put_user(val.i[2], &pMem[2]);
+#else
put_user(val.i[1], &pMem[2]);
put_user(val.i[2], &pMem[1]); /* msw */
+#endif
}
#endif
@@ -194,6 +205,7 @@
case typeSingle:
case typeDouble:
{
+ /* @@@ big-endian */
put_user(p[0], &pMem[2]); /* single */
put_user(p[1], &pMem[1]); /* double msw */
put_user(nType << 14, &pMem[0]);
diff -urN linux-2.6.14.commit/arch/arm/nwfpe/fpa11.h linux-2.6.14.snap/arch/arm/nwfpe/fpa11.h
--- linux-2.6.14.commit/arch/arm/nwfpe/fpa11.h 2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.14.snap/arch/arm/nwfpe/fpa11.h 2005-10-29 01:41:41.000000000 +0200
@@ -60,7 +60,7 @@
#ifdef CONFIG_FPE_NWFPE_XP
floatx80 fExtended;
#else
- int padding[3];
+ u32 padding[3];
#endif
} FPREG;
diff -urN linux-2.6.14.commit/arch/arm/nwfpe/fpopcode.c linux-2.6.14.snap/arch/arm/nwfpe/fpopcode.c
--- linux-2.6.14.commit/arch/arm/nwfpe/fpopcode.c 2005-11-06 17:00:11.000000000 +0100
+++ linux-2.6.14.snap/arch/arm/nwfpe/fpopcode.c 2005-11-06 17:00:02.000000000 +0100
@@ -29,14 +29,14 @@
#ifdef CONFIG_FPE_NWFPE_XP
const floatx80 floatx80Constant[] = {
- {0x0000, 0x0000000000000000ULL}, /* extended 0.0 */
- {0x3fff, 0x8000000000000000ULL}, /* extended 1.0 */
- {0x4000, 0x8000000000000000ULL}, /* extended 2.0 */
- {0x4000, 0xc000000000000000ULL}, /* extended 3.0 */
- {0x4001, 0x8000000000000000ULL}, /* extended 4.0 */
- {0x4001, 0xa000000000000000ULL}, /* extended 5.0 */
- {0x3ffe, 0x8000000000000000ULL}, /* extended 0.5 */
- {0x4002, 0xa000000000000000ULL} /* extended 10.0 */
+ { .high = 0x0000, .low = 0x0000000000000000ULL},/* extended 0.0 */
+ { .high = 0x3fff, .low = 0x8000000000000000ULL},/* extended 1.0 */
+ { .high = 0x4000, .low = 0x8000000000000000ULL},/* extended 2.0 */
+ { .high = 0x4000, .low = 0xc000000000000000ULL},/* extended 3.0 */
+ { .high = 0x4001, .low = 0x8000000000000000ULL},/* extended 4.0 */
+ { .high = 0x4001, .low = 0xa000000000000000ULL},/* extended 5.0 */
+ { .high = 0x3ffe, .low = 0x8000000000000000ULL},/* extended 0.5 */
+ { .high = 0x4002, .low = 0xa000000000000000ULL},/* extended 10.0 */
};
#endif
diff -urN linux-2.6.14.commit/arch/arm/nwfpe/softfloat.h linux-2.6.14.snap/arch/arm/nwfpe/softfloat.h
--- linux-2.6.14.commit/arch/arm/nwfpe/softfloat.h 2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.14.snap/arch/arm/nwfpe/softfloat.h 2005-11-06 16:55:13.000000000 +0100
@@ -51,11 +51,17 @@
Software IEC/IEEE floating-point types.
-------------------------------------------------------------------------------
*/
-typedef unsigned long int float32;
-typedef unsigned long long float64;
+typedef u32 float32;
+typedef u64 float64;
typedef struct {
- unsigned short high;
- unsigned long long low;
+#ifdef __ARMEB__
+ u16 __padding;
+ u16 high;
+#else
+ u16 high;
+ u16 __padding;
+#endif
+ u64 low;
} floatx80;
/*
Reply to: