[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

[PATCH,RFC] nwfpe: fix issues related to big-endian



Hi,

Attached is a patch to fix and reenable extended precision support for
the in-kernel nwfpe floating point emulator on big-endian ARM platforms,
and is a first step towards unfscking nwfpe for big-endian.

As a reminder: nwfpe uses the FPA floating point format, which is a way
of representing IEEE 754 single-precision, double-precision and extended
double-precision numbers as arrays of one, two or three 32 bit words.
The FPA format uses native endian byte order but big-endian word order,
while nwfpe internally uses fully native 'long long' byte order,
resulting in a confusing mix of floating point formats:

40 84 d0 00 00 00 00 00		666.0 in IEEE 754 double precision
40 84 d0 00 00 00 00 00		666.0 in big-endian FPA byte order
00 d0 84 40 00 00 00 00		666.0 in little-endian FPA byte order
40 84 d0 00 00 00 00 00		666.0 in big-endian nwfpe internal byte order
00 00 00 00 00 d0 84 40		666.0 in little-endian nwfpe internal byte order

Note that to convert from FPA to nwfpe byte order and vice versa:
- on little-endian ARM, you have to swap the two halves of the double.
- on big-endian ARM, you don't have to do anything as the formats are the same.

There are a couple more (ugly) issues remaining:
1) The extended precision format that nwfpe uses differs from the format
   that is described in the FPA spec.  In the spec, the sign bit is in bit
   31 of the first word, while in nwfpe, the sign bit is in bit 15 of the
   first word.  What does actual FPA hardware do here?  Maybe the spec is
   just wrong?

2) Ralph Siemsen told me a while ago that, contrary to what the FPA spec
   says, there are programs that depend on the exact format that is used
   by the LFM/SFM instructions.

3) The GETFPREGS ptrace call dumps the internal nwfpe state buffer (eeew)
   to userspace (i.e. in 'fully native long long byte order'), instead of
   the FPA format or something else more sensible.

   This causes many userland applications that have a need to mess with
   this data (for example, gdb) to just blindly swap the upper and lower
   words of the data returned from the kernel, assuming that that is the
   right way to convert from nwfpe byte order to FPA byte order.  As per
   the above, this method of converting doubles is broken on big-endian.

   [Also, there is another floating point emulator (fastfpe) in the kernel,
   which uses a different internal state buffer format, so apart from the
   fact that our GETFPREGS buffer format is nonsensical, it's not even
   consistently so.]

   [Another bug: ptrace_getfpregs in arch/arm/kernel/ptrace.c copies a
   struct fp_state to userland, which is 35 words big if iWMMXt is not
   compiled in and 39 words big if it is, using a sizeof(struct user_fp),
   which is 29 words big.]

4) The ARM ELF core dump format uses yet another definition of the
   floating point word format using bitfields (struct user_fp), which
   isn't compatible with any of the other formats, but when a core dump
   is made, simply copies the same nwfpe local state buffer into the
   core file (arch/arm/kernel/process.s:dump_fpu).


What to do:
1) Someone with actual FPA hardware should test this out, or someone
   more knowledgable about the FPA spec than me should throw in his 2ct.

2) Anyone have more info on this?  (Or maybe he just meant GETFPREGS?)

3) Tricky.  We can't really change the little-endian ARM behaviour
   anymore since this is a user-visible ABI.  The problem with deciding
   how to make GETFPREGS behave on big-endian is that there isn't really
   any kind of definition of the structure format.  There are two ways we
   can define this structure, which are each compatible with how things
   are currently done on little-endian:

	1) Define GETFPREGS as storing doubles in 'reverse byte order'
		(i.e. from LSB to MSB.)

	2) Define GETFPREGS as storing doubles in 'native byte order but
		little-endian word order' (i.e. little-endian byte order
		and little-endian word order on little-endian systems,
		and big-endian byte order and little-endian word order
		on big-endian systems.)

   Option 1) would make the big-endian format byte-wise compatible with
   little-endian, but would require all userspace applications to check
   if defined(__ARMEB__) and to conditionally byteswap each word (and swap
   the two words) to convert back to native (FPA) word order if that is
   the case.  Because the float format in core files isn't byte-wise
   identical anyway, I don't see much value in this option.

   For applications that swap the two sub-words to convert from kernel
   ('nwfpe') order to native order, option 2) would have the advantage
   of not requiring any additional userspace modifications to make those
   apps work on big-endian.

4) Haven't looked into this too closely yet.


Any ideas?

 
cheers,
Lennert


diff -urN linux-2.6.14.commit/arch/arm/Kconfig linux-2.6.14.snap/arch/arm/Kconfig
--- linux-2.6.14.commit/arch/arm/Kconfig	2005-11-06 17:00:50.000000000 +0100
+++ linux-2.6.14.snap/arch/arm/Kconfig	2005-11-06 17:00:31.000000000 +0100
@@ -568,7 +568,7 @@
 
 config FPE_NWFPE_XP
 	bool "Support extended precision"
-	depends on FPE_NWFPE && !CPU_BIG_ENDIAN
+	depends on FPE_NWFPE
 	help
 	  Say Y to include 80-bit support in the kernel floating-point
 	  emulator.  Otherwise, only 32 and 64-bit support is compiled in.
diff -urN linux-2.6.14.commit/arch/arm/nwfpe/fpa11_cpdt.c linux-2.6.14.snap/arch/arm/nwfpe/fpa11_cpdt.c
--- linux-2.6.14.commit/arch/arm/nwfpe/fpa11_cpdt.c	2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.14.snap/arch/arm/nwfpe/fpa11_cpdt.c	2005-11-06 12:24:34.000000000 +0100
@@ -59,8 +59,13 @@
 	p = (unsigned int *) &fpa11->fpreg[Fn].fExtended;
 	fpa11->fType[Fn] = typeExtended;
 	get_user(p[0], &pMem[0]);	/* sign & exponent */
+#ifdef __ARMEB__
+	get_user(p[1], &pMem[1]);	/* ms bits */
+	get_user(p[2], &pMem[2]);	/* ls bits */
+#else
 	get_user(p[1], &pMem[2]);	/* ls bits */
 	get_user(p[2], &pMem[1]);	/* ms bits */
+#endif
 }
 #endif
 
@@ -78,6 +83,7 @@
 	case typeSingle:
 	case typeDouble:
 		{
+			/* @@@ big-endian  */
 			get_user(p[0], &pMem[2]);	/* Single */
 			get_user(p[1], &pMem[1]);	/* double msw */
 			p[2] = 0;			/* empty */
@@ -177,8 +183,13 @@
 	}
 
 	put_user(val.i[0], &pMem[0]);	/* sign & exp */
+#ifdef __ARMEB__
+	put_user(val.i[1], &pMem[1]);	/* msw */
+	put_user(val.i[2], &pMem[2]);
+#else
 	put_user(val.i[1], &pMem[2]);
 	put_user(val.i[2], &pMem[1]);	/* msw */
+#endif
 }
 #endif
 
@@ -194,6 +205,7 @@
 	case typeSingle:
 	case typeDouble:
 		{
+			/* @@@ big-endian  */
 			put_user(p[0], &pMem[2]);	/* single */
 			put_user(p[1], &pMem[1]);	/* double msw */
 			put_user(nType << 14, &pMem[0]);
diff -urN linux-2.6.14.commit/arch/arm/nwfpe/fpa11.h linux-2.6.14.snap/arch/arm/nwfpe/fpa11.h
--- linux-2.6.14.commit/arch/arm/nwfpe/fpa11.h	2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.14.snap/arch/arm/nwfpe/fpa11.h	2005-10-29 01:41:41.000000000 +0200
@@ -60,7 +60,7 @@
 #ifdef CONFIG_FPE_NWFPE_XP
 	floatx80 fExtended;
 #else
-	int padding[3];
+	u32 padding[3];
 #endif
 } FPREG;
 
diff -urN linux-2.6.14.commit/arch/arm/nwfpe/fpopcode.c linux-2.6.14.snap/arch/arm/nwfpe/fpopcode.c
--- linux-2.6.14.commit/arch/arm/nwfpe/fpopcode.c	2005-11-06 17:00:11.000000000 +0100
+++ linux-2.6.14.snap/arch/arm/nwfpe/fpopcode.c	2005-11-06 17:00:02.000000000 +0100
@@ -29,14 +29,14 @@
 
 #ifdef CONFIG_FPE_NWFPE_XP
 const floatx80 floatx80Constant[] = {
-	{0x0000, 0x0000000000000000ULL},	/* extended 0.0 */
-	{0x3fff, 0x8000000000000000ULL},	/* extended 1.0 */
-	{0x4000, 0x8000000000000000ULL},	/* extended 2.0 */
-	{0x4000, 0xc000000000000000ULL},	/* extended 3.0 */
-	{0x4001, 0x8000000000000000ULL},	/* extended 4.0 */
-	{0x4001, 0xa000000000000000ULL},	/* extended 5.0 */
-	{0x3ffe, 0x8000000000000000ULL},	/* extended 0.5 */
-	{0x4002, 0xa000000000000000ULL}		/* extended 10.0 */
+	{ .high = 0x0000, .low = 0x0000000000000000ULL},/* extended 0.0 */
+	{ .high = 0x3fff, .low = 0x8000000000000000ULL},/* extended 1.0 */
+	{ .high = 0x4000, .low = 0x8000000000000000ULL},/* extended 2.0 */
+	{ .high = 0x4000, .low = 0xc000000000000000ULL},/* extended 3.0 */
+	{ .high = 0x4001, .low = 0x8000000000000000ULL},/* extended 4.0 */
+	{ .high = 0x4001, .low = 0xa000000000000000ULL},/* extended 5.0 */
+	{ .high = 0x3ffe, .low = 0x8000000000000000ULL},/* extended 0.5 */
+	{ .high = 0x4002, .low = 0xa000000000000000ULL},/* extended 10.0 */
 };
 #endif
 
diff -urN linux-2.6.14.commit/arch/arm/nwfpe/softfloat.h linux-2.6.14.snap/arch/arm/nwfpe/softfloat.h
--- linux-2.6.14.commit/arch/arm/nwfpe/softfloat.h	2005-10-28 02:02:08.000000000 +0200
+++ linux-2.6.14.snap/arch/arm/nwfpe/softfloat.h	2005-11-06 16:55:13.000000000 +0100
@@ -51,11 +51,17 @@
 Software IEC/IEEE floating-point types.
 -------------------------------------------------------------------------------
 */
-typedef unsigned long int float32;
-typedef unsigned long long float64;
+typedef u32 float32;
+typedef u64 float64;
 typedef struct {
-    unsigned short high;
-    unsigned long long low;
+#ifdef __ARMEB__
+    u16 __padding;
+    u16 high;
+#else
+    u16 high;
+    u16 __padding;
+#endif
+    u64 low;
 } floatx80;
 
 /*



Reply to: