Bug#799905: gcc-4.7: generates broken SSE2 code for -ftree-vectorize/-O3 for unaligned dword access
Package: gcc-4.7
Version: 4.7.2-5
Severity: important
On x86 and x86-64, the platform explicitly supports unaligned access,
and in fact such access has been heavily optimized on the latest Intel
and AMD processors.
A _lot_ of code takes advantage of this, as it is often extremely
painful (or slow) to byte-read word and dword-based structures from
memory/file data with random alignment. And gcc is _not_ smart enough
to always coalesce something like:
s += (*(p++) | (*(p++) << 8) | ... | (*(p++)<<24))
into an unaligned dword read for x86 and x86-64 (which would be much
faster than four byte reads, three shifts and three ORs).
Unfortunately the auto-vectorization code in Wheezy's gcc can lose track
of whether a pointer is or is not guaranteed to be aligned, and can
generate SSE2 code that cannot deal with unaligned access.
This causes a program that works fine on -O2, to crash with a general
protection fault trap when compiled with -O3.
I have attached a small reproducer. Tested in a 32-bit Pentium M, as
well as in a 64-bit Core i5. Run it without parameters, so that
argc == 1.
Observed results:
CFLAGS -O2 : works
CFLAGS -O2 -msse2 : works
CFLAGS -O3 -msse2 : CRASH
CFLAGS -O2 -msse2 -ftree-vectorize : CRASH
I am not sure whether this issue has been already fixed in newer
upstream versions of gcc or not.
-- System Information:
Debian Release: 7.9
APT prefers oldstable
APT policy: (990, 'oldstable'), (500, 'oldstable-updates'), (500, 'oldstable-proposed-updates')
Architecture: i386 (i686)
Kernel: Linux 3.10.89-t43+ (PREEMPT)
Locale: LANG=pt_BR.UTF-8, LC_CTYPE=pt_BR.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages gcc-4.7 depends on:
ii binutils 2.22-8+deb7u2
ii cpp-4.7 4.7.2-5
ii gcc-4.7-base 4.7.2-5
ii libc6 2.13-38+deb7u8
ii libgcc1 1:4.7.2-5
ii libgmp10 2:5.0.5+dfsg-2
ii libgomp1 4.7.2-5
ii libitm1 4.7.2-5
ii libmpc2 0.9-4
ii libmpfr4 3.1.0-5
ii libquadmath0 4.7.2-5
ii zlib1g 1:1.2.7.dfsg-13
Versions of packages gcc-4.7 recommends:
ii libc6-dev 2.13-38+deb7u8
Versions of packages gcc-4.7 suggests:
pn binutils-gold <none>
ii gcc-4.7-doc 4.7.2-2
pn gcc-4.7-locales <none>
pn gcc-4.7-multilib <none>
pn libcloog-ppl0 <none>
pn libgcc1-dbg <none>
pn libgomp1-dbg <none>
pn libitm1-dbg <none>
pn libmudflap0-4.7-dev <none>
pn libmudflap0-dbg <none>
pn libppl-c2 <none>
pn libppl7 <none>
pn libquadmath0-dbg <none>
-- no debconf information
--
"One disk to rule them all, One disk to find them. One disk to bring
them all and in the darkness grind them. In the Land of Redmond
where the shadows lie." -- The Silicon Valley Tarot
Henrique Holschuh
int checksum(const char *data, unsigned int count)
{
const unsigned int *p = (const unsigned int *)data;
unsigned int s = 0;
while (count--) { s += *(p++); }
return s;
}
int main(int argc, char**argv)
{
unsigned int d[257] = {};
return (checksum((char*)(&d[0])+argc+1, 256)) ? argc & 2 : argc & 4;
}
Reply to: