Re: Atlas proposal
Ben Hutchings <email@example.com> writes:
> On Tue, 2010-08-17 at 23:56 +0200, Sylvestre Ledru wrote:
>> Le mardi 17 aoÃ»t 2010 Ã 22:45 +0100, Roger Leigh a Ã©crit :
>> > Disabling threading is also suspect: how can the optimal number of
>> > threads possibly be determined at build time? This should also be
>> > configurable or at least auto-detectable at runtime.
>> C/C from the FAQ:
>> "Can I vary the number of threads ATLAS uses dynamically?
>> No. The maximum number of threads to use is determined at compile time.
>> ATLAS will never use more than this, but may use less if the problem
>> sizes are too small to get speedup from the additional parallelism."
> Can we set a large maximum at build time and then reduce it at run-time
> based on the number of hardware threads found at run-time? Or does it
> do that already? Given the phrase 'may use less', it is clear that the
> code does support varying the number of threads used at run-time.
The modifier "if the problem sizes are too small" probably means that it
does use a devide & conquere algorithm. On each step it forks half the
work into a new thread and does the other half locally, up to the
maximum number of threads. If the problem is too smal it simply doesn't
devide enough to use all threads. If it is larger some queueing of jobs
It might not be hard to lower the limit at runtime but it doesn't mean
the possibility already exists.
>> > In short, Atlas' approach to optimisation by detecting everything at
>> > build time is wrong. Rather than working around this limitation by
>> > totally crippling the library to work on a least-common-denominator
>> > system by removing all optimisations and threading, it should be
>> > actually fixed, probably best if done in collaboration with upstream.
>> OK, I forgot to add something.
>> I know upstream is doing it "wrong" from our distro point of view.
>> However, I am not upstream, I don't plan to patch atlas to manage this
>> and I don't think upstream is interested in it. It is not the approach
>> of upstream and I don't think the current build system will allow the
>> introduction of such features easily.
> The dynamic linker does the run-time selection for you. All you need to
> do is to install the optimised libraries in subdirectories that specify
> the hardware they require. Currently the following platform and
> capability flag names are recognised for i386:
> "i386", "i486", "i586", "i686",
> "fpu", "vme", "de", "pse", "tsc", "msr", "pae", "mce",
> "cx8", "apic", "10", "sep", "mtrr", "pge", "mca", "cmov",
> "pat", "pse36", "pn", "clflush", "20", "dts", "acpi", "mmx",
> "fxsr", "sse", "sse2", "ss", "ht", "tm", "ia64", "pbe"
> Use nested subdirectories to specify multiple flags. The library in the
> most specific directory (i.e. the one which selects the most flags, all
> satisfied by the current hardware) will be used.
All of those times the number of cores (say 1, 2, 3, 4, 6, 8) and times
several L1/L2/L3 cache sizes. Do you really want that many atlas
packages in the archive?
I think some middle ground would be good, which I think is kind of what
the maintainer is suggesting:
1) build some commonly available optimized versions with fixed values
instead of probing the build system. But only a limited few. E.g. for
amd64 build a 1 and 4 core version with sse2.
2) Make it easy to locally build an optimized version.