[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Bug#799905: gcc-4.7: generates broken SSE2 code for -ftree-vectorize/-O3 for unaligned dword access



Package: gcc-4.7
Version: 4.7.2-5
Severity: important

On x86 and x86-64, the platform explicitly supports unaligned access,
and in fact such access has been heavily optimized on the latest Intel
and AMD processors.

A _lot_ of code takes advantage of this, as it is often extremely
painful (or slow) to byte-read word and dword-based structures from
memory/file data with random alignment.  And gcc is _not_ smart enough
to always coalesce something like:

s += (*(p++) | (*(p++) << 8) | ... | (*(p++)<<24))

into an unaligned dword read for x86 and x86-64 (which would be much
faster than four byte reads, three shifts and three ORs).

Unfortunately the auto-vectorization code in Wheezy's gcc can lose track
of whether a pointer is or is not guaranteed to be aligned, and can
generate SSE2 code that cannot deal with unaligned access.

This causes a program that works fine on -O2, to crash with a general
protection fault trap when compiled with -O3.

I have attached a small reproducer.  Tested in a 32-bit Pentium M, as
well as in a 64-bit Core i5.  Run it without parameters, so that 
argc == 1.

Observed results:
  CFLAGS -O2 : works
  CFLAGS -O2 -msse2 : works

  CFLAGS -O3 -msse2 : CRASH
  CFLAGS -O2 -msse2 -ftree-vectorize : CRASH

I am not sure whether this issue has been already fixed in newer
upstream versions of gcc or not.

-- System Information:
Debian Release: 7.9
  APT prefers oldstable
  APT policy: (990, 'oldstable'), (500, 'oldstable-updates'), (500, 'oldstable-proposed-updates')
Architecture: i386 (i686)

Kernel: Linux 3.10.89-t43+ (PREEMPT)
Locale: LANG=pt_BR.UTF-8, LC_CTYPE=pt_BR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash

Versions of packages gcc-4.7 depends on:
ii  binutils      2.22-8+deb7u2
ii  cpp-4.7       4.7.2-5
ii  gcc-4.7-base  4.7.2-5
ii  libc6         2.13-38+deb7u8
ii  libgcc1       1:4.7.2-5
ii  libgmp10      2:5.0.5+dfsg-2
ii  libgomp1      4.7.2-5
ii  libitm1       4.7.2-5
ii  libmpc2       0.9-4
ii  libmpfr4      3.1.0-5
ii  libquadmath0  4.7.2-5
ii  zlib1g        1:1.2.7.dfsg-13

Versions of packages gcc-4.7 recommends:
ii  libc6-dev  2.13-38+deb7u8

Versions of packages gcc-4.7 suggests:
pn  binutils-gold        <none>
ii  gcc-4.7-doc          4.7.2-2
pn  gcc-4.7-locales      <none>
pn  gcc-4.7-multilib     <none>
pn  libcloog-ppl0        <none>
pn  libgcc1-dbg          <none>
pn  libgomp1-dbg         <none>
pn  libitm1-dbg          <none>
pn  libmudflap0-4.7-dev  <none>
pn  libmudflap0-dbg      <none>
pn  libppl-c2            <none>
pn  libppl7              <none>
pn  libquadmath0-dbg     <none>

-- no debconf information

-- 
  "One disk to rule them all, One disk to find them. One disk to bring
  them all and in the darkness grind them. In the Land of Redmond
  where the shadows lie." -- The Silicon Valley Tarot
  Henrique Holschuh
int checksum(const char *data, unsigned int count)
{
    const unsigned int *p = (const unsigned int *)data;
    unsigned int s = 0;

    while (count--) { s += *(p++); }

    return s;
}

int main(int argc, char**argv)
{
    unsigned int d[257] = {};

    return (checksum((char*)(&d[0])+argc+1, 256)) ? argc & 2 : argc & 4;
}

Reply to: