[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Update on unaligned traps...

On 28 Aug 1997, Ian Willmott wrote:

> There's more going on here than is initially apparent. The proposed fix
> happens to work in this case, but will likely fail in other similar
> situations. I think a more reliable solution is needed.

Hmmm....This message was incredibly informative and may help me fix
another harder-to-find unaligned trap problem I'm having.  Since I hadn't
really delved too deeply into the problem and don't know a whole lot about
the kernel, I didn't really get too detailed in figuring out why this
wasn't working properly...

> More specifically, the addresses the array (actually called part_table)
> contains are not 32-bit aligned, because they reference the disk
> partition table held in a memory image of the boot sector.

That's really what I was trying to say, but lack of sleep seemed to be
eating my brain that night.  Thanks for making that clearer :)

> try this:
> struct partition {
>     u_char boot_ind; u_char head;     u_char sector;     u_char cyl;
>     u_char sys_ind;  u_char end_head; u_char end_sector; u_char end_cyl;
>     u_int start_sect; u_int nr_sects;
> };
> struct partition_packed {
>     u_char boot_ind; u_char head;     u_char sector;     u_char cyl;
>     u_char sys_ind;  u_char end_head; u_char end_sector; u_char end_cyl;
>     u_int start_sect; u_int nr_sects;
> } __attribute__((packed));
> printf("struct partition: %d  struct partition_packed: %d\n",
>        sizeof(struct partition),sizeof(struct partition_packed));
> The compiler reports the size of both structures as 16 bytes. Because
> the int members are preceded by eight char members, no padding is
> necessary, and declaring the structure to be packed does not change the
> layout at all. In this case, this is fortunate, because if the structure
> layout was changed, it would no longer match the arrangement of the disk
> boot sector which it is mapped into, and the program would not work.

Very true.  I was starting to think about this yesterday and was starting
to wonder about why exactly that happened to work....After tons of
thought, it just seemed like that the extension shouldn't have made that
much of a difference....

> The code for reading the packed structure member makes no assumptions
> about the alignment of that data; it reads it a byte at a time. (This is
> NOT the most efficient way of doing this; the architecture reference
> manual lists a six-instruction sequence.) The "ldq_u" instruction (Load
> Quadword Unaligned) avoids the alignment trap by clearing the three
> least significant bits of the effective address.

I haven't really studied the architecture manual as of yet, but is that
instruction (ldq_u) common even down to the UDB's Alpha?  I know the newer
processors added quite a few instructions to the manual, but don't know
which ones are supported where anymore (had a chart, but can't find it
here).  If the instruction is common throughout the Alpha family, then you
may want to notify Richard Henderson that this could be a target for
optimisation with gcc.

> This technique works in this case because of the fortuitous circumstance
> that the structure in question was already as packed as it could get; if
> this had not been so, the program would have malfunctioned, as discussed
> above. It therefore does not qualify as a general solution. Note that
> this is probably not an issue with kernel data structures; the kernel
> header files will contain the relevant structure declaration, and the
> appropriate code will be generated, whether or not the structure was
> declared as packed, because kernel and user code will have the same
> understanding of its layout and alignment.

Hmmm....well, I have been meaning to play with non-padded structs to see
just how well things are aligned and such.  I understand that, if the
struct hadn't been packed as-is, there would've been a more serious

> The problem is only likely to occur with structures which must be binary
> compatible across different architectures or which are mapped into
> binary images at computed addresses; fdisk meets both these criteria. I
> suggest another solution in such cases.
> int32_t read_int32_unaligned(char *);
> void   write_int32_unaligned(char *,u_int32_t);
> int64_t read_int64_unaligned(char *);
> void   write_int64_unaligned(char *,u_int64_t);
> These functions can be written in assembly language using the most
> efficient instruction sequence; on the Alpha, this appears to be about
> six to twelve instructions. On architectures where alignment does not
> matter, they can be coded as macros which simply convert and dereference
> the pointer:
> #define  read_int32_unaligned(p)     (*(int32_t *)(p))
> #define write_int32_unaligned(p,x) (*(u_int32_t *)(p)=(x))
> #define  read_int64_unaligned(p)     (*(int64_t *)(p))
> #define write_int64_unaligned(p,x) (*(u_int64_t *)(p)=(x))
> The typenames are platform-independent typedefs from glibc; they are
> used because the width of the built-in integer types is not guaranteed
> to be the same across different CPUs, or even different compilers. The
> return type and parameter are declared as signed and unsigned
> respectively, so that the same function will work for both signed and
> unsigned values without the compiler complaining about implicit unsigned
> to signed conversions. The compiler appears to assume that short
> pointers are 16-bit aligned, so word versions are probably required as
> well. The compiler makes no assumptions about the alignment of char
> pointers, so byte versions are unnecessary.
> Instead of a structure declaration, macros can be defined to access
> specific fields by name:
> #define    start_sect(p)    read_int32_unaligned((char *)(p)+8)
> #define wr_start_sect(p,x) write_int32_unaligned((char *)(p)+8,(x))
> #define    nr_sects(p)      read_int32_unaligned((char *)(p)+12)
> #define wr_nr_sects(p,x)   write_int32_unaligned((char *)(p)+12,(x))
> At the cost of slightly more complicated syntax, this solution should be
> portable across all architectures, regardless of how the native compiler
> lays out structures, and should completely avoid alignment traps on
> systems where this matters.

Sounds good to me.  FYI, my goal was originally to try to get fdisk to
work without unaligned trap warnings.  I happened upon the solution I gave
because of a suggestion that I was given.  I was trying hard to make
existing code work without major modification since I don't really have
too much time to rewrite things.  I do intend to rewrite fdisk and several
other things (fdisk should be rewritten more like cfdisk -- which is much
cleaner as far as access to the structs it needs), but not for at least
six months.  Plus, I'm still getting somewhat up to speed with Alpha
assembly language commands (I still am very rusty in as).  Basically, what
I'm saying is, please please please feel free to make the proper rewrites
and that way I can get a better idea of how exactly everything will
integrate into a finished programme.  Your explanation was excellent, but
since I'm on vacation, my brain has stopped working, so I'll need to see
code to understand :P

Thanks for your excellent comments and suggestions!  I intend on putting
that knowledge to use in the near future with my problems with mandb and
identd.  In the meantime, if you could, please forward that little
optimisation tip to Richard Henderson and maybe that'll boost our
performance slightly :)


 Christopher C. Chimelis          chris@classnet.med.miami.edu
 Systems Supervisor
 Division of Biomedical Communications
 University of Miami School of Medicine
 --> finger chris@classnet.med.miami.edu for PGP public key <--

TO UNSUBSCRIBE FROM THIS MAILING LIST: e-mail the word "unsubscribe" to
debian-alpha-request@lists.debian.org . 
Trouble?  e-mail to templin@bucknell.edu .

Reply to: