[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: Pgcc in Deb

Drake Diedrich wrote:
> On Mon, Apr 03, 2000 at 11:35:08AM +0300, Eray Ozkural wrote:
> >
> > I disagree. From experience, I know that up to %50 speedup can be gained
> > in number crunching stuff. I'd suspect %20 could be pretty normal for
> > most CPU hungry apps, and the overall speedup would be significant.
> >
>    Please post full information when you make claims like this.  Code,
> compiler versions and options, /proc/cpuinfo, ...  We hear these claims a
> lot - it's apparent that you haven't scanned the archives for the last time
> this topic came up.

I would post, if I hadn't seen that I had remembered incorrectly.
Okay, I blew up there. I am sorry. I am going to get better sleep next time.
And it's true, I didn't scan the archives for this topic.

>    The numbers I ran on povray (a compute intensive task) don't show any
> significant (-ly larger than the deviation between independent
> measurements) speedups using a variety of -mcpu and -march options.
> For instance, the best time for a K6/300 when compiled for i486 was 3
> seconds (1.5%) faster than when compiled or scheduled for k6.  Actually
> scheduling for "pentium" had the most noticeable effect, and
> that was a slowdown.

Well, unfortunately my numbers aren't any better. Forgive me :P

>    Real Pentium (R)'s seem to have weird scheduling requirements compared to
> other ix86 platforms (bother newer and older), but scheduling for i486 (the
> default) does as well as anything else. No other platform
> (PII/Celeron/K6/486) showed significant variation with cpu/arch flags.

Okay, here are my ideas: these chips do out-of-order execution and deep pipelines
to translate x86 instructions to their RISCy core. That means the scheduling
algorithms which made the difference for an 68020 won't make it here. The only
good speedup could come from utilization of on-chip parallelism via SIMD
instructions on these chips. Intel, AMD, and Motorola employ similar strategies
to get vector operations onto the chip. The real benefit in this case could
come from parallelizing compilers which generate awfully good vector code.Then
the chip has more explicit information on how to conduct concurrent execution
on its functional units. Such an effort for GNU/Linux systems was recently
desribed in a Slashdot article for Altivec. I overlooked that article, but
that might be worthy of investigation although parallelizing compilers are
known to be troubled with C since pointer semantics are extremely free.

>    It would be interesting to add K7, the newer PIII's, and pgcc to these
> benchmarks, but they'd all have to be rerun so that the exact same binaries
> were used on all platforms (a big hassle).  Until gcc 2.96 is released I see
> little point in retesting for architecture differences.  IIRC from my brief
> tests of pgcc a while back it didn't perform as well as the then current
> egcc anyway.  The egcc speedup over gcc272 was around 10%.
>    http://master.debian.org/~dld/ix86-povray-benchmark

Still, I am sure there's a lot of room for improvement. For instance, there
are some results which I don't usually disclose. A state-of-the art
graph partitioning tool developed at my dept. performs much better on i686
when compiled with Microsoft's VC than on the gcc. Don't worry, I turned on all
the optimization flags I could find, and I ran it on Debian in single user mode.
I was pretty frustrated. Because of this disappointment, I think I've been a
bit obsessed with gcc performance. My guess is that the -mcpu=i686 optimizations
are largely immaterial currently. I think they could come up with even
better than 10% with gcc-2.96. This time I speak from more solid experience :)
There could be 20% improvement I guess, If I remember correctly the benchmarks
I talked about showed VC to be at least that much better. Please, no flames...
The developer of that package is not here, and I'm not sure where I put those
benchmarks, but still I was bashed as a proud programmer who only used gcc.
Yet, the VC is a terrible compiler, especially for C++... But it surely has
a better x86 backend

 ++++-+++-+++-++-++-++--+---+----+----- ---  --  -  - 
 +  Eray "eXa" Ozkural                   .      .   .  . . .
 +  CS, Bilkent University, Ankara             ^  .  o   .      .
 |  mail: erayo@cs.bilkent.edu.tr                .  ^  .   .

Reply to: