Re: hwcap supporting architectures?
On Mon, Jan 17, 2005 at 05:52:04PM +0900, GOTO Masanori wrote:
> > > > Yes, and if ev67 is instruction upper compatible with ev56 (I
> > > > guess so), I think it's acceptable to add a symlink "ln -sf
> > > > lib/ev67/libfoo.so lib/ev56/libfoo.so".
> > >
> > > Ugh... that pushes the burden of maitaining support for new
> > > architectures to the package.
> Yeah - I think it's trade off - whether we support library
> optimization package or we don't get a bit performance improvement.
So, you are trading maintainance cost for a rather subjective speed
improvent? Or should I say, preventing some performance degradation?
> > > Please bear with me, but I'm trying to understand the issue: is
> > > the cost of calling access(2) or stat(2) really so high?
> > I'd consider it quite acceptable in this case. However, as I tried
> > to express, it's not possible with glibc's current "design", and I
> > didn't feel like changing that.
> Note that we should keep in mind: imagine most binaries on all debian
> system over the world start to consume access(2)/stat(2) system call
> cost in each binary execution time - "Many a little makes a mickle".
Ok, I stopped buying this kind of argument long ago. There's a
SIGGRAPH paper (2001 IIRC) which justifies certain kind of rather
complex optimization because a (graphics) context switch is "too
expensive", without actually defining the situation that triggers the
context switch in a clear fashion. In my own testing context switches
of the kind described in that paper are at least a factor of 100
_faster_ than what the authors claim.
Attached is a program that measures the time a single stat(2) call
takes. I get circa 5 microseconds per stat(2) call on my computer (AMD
Athlon 1600+, can't recall what kind of memory it has right now). Note
that the code that doesn't directly have to do with the stat(2) has a
rather low overhead (circa 1 ns on my system).
What that means is that you need to make about 2000 stat(2) calls to
get _anywhere_ near what's measurable by a human and about 20000 to
start getting said human annoyed.
If a biggish GNOME program (Epiphany Browser) links to 60 libraries,
you need to perform a lookup in ~ 30 paths for the start up delay to be
measurable and ~ 300 for it to be annoying. ls(1) links to 6
libraries. That's one order of magnitude less, IOW, you need a path
with ~ 3000 components to start being annoying.
So, what exactly are you talking about?
> > > I see for example that on start up the file /etc/ld.so.nohwcap is
> > > accessed multiple times (and it's not present, isn't that a race?
> > > what happens if the file suddenly appears in the middle of
> > > program start up? what's that file anyway, I can't find it
> > > mentioned in the documentation).
> > It's supposed to disable the use of hwcaps. Stating it multiple
> > times seems like a bug.
The contents does not matter?
> Debian glibc has been applied a special patch to check
> /etc/ld.so.nohwcap before loading libraries each time. You can see
> it in debian-glibc package ldso-disable-hwcap.dpatch written by Ben
> and Daniel. It enables us to upgrade smoothly even if we use
> optimized libraries - this effort is one of debian's nice features.
> But the drawback is it needs to pay access(2) lookup cost as you
> pointed out.
> Checking /etc/ld.so.nohwcap each time (some binaries call multiple
> times) is the current patch design
Why? I just can't see a valid reason for "wanting" the file to
suddenly pop up while the program is running.
> I think this is safer than checking /etc/ld.so.nohwcap once in
> program startup time.
Safer in what way?
Again, I just don't buy that "system calls are too expensive" argument.
Anyone writing shell scripts cares about a whole lot of things *but*
performance. And I'm not talking about increasing running time by a
factor of anything, I'm talking about adding a bunch of microseconds,
which get lost in the middle of filesystem stalls, page faults and
other rather common events.
int main(int argc, char * argv)
const int N = 6;
for(int i=0; i < N; ++i)
name[i] = '0';
name[N] = 0;
struct timeval t0, t1;
for(int i=0; i < N;)
struct stat buf;
for(i=0; i != N && ++name[i] == '9'+1; ++i)
float dt = (t1.tv_sec - t0.tv_sec) + (t1.tv_usec - t0.tv_usec)*1E-6;
printf("%g\n", dt/powf(10, N));