[Date Prev][Date Next] [Thread Prev][Thread Next] [Date Index] [Thread Index]

Re: How to write optimized code for an instruction set not supported by my computer?

El 13/11/15 a las 18:50, Joel Rees escribió:
On Mon, Nov 9, 2015 at 7:54 AM, Mario Castelán Castro
<marioxcc.MT@yandex.com> wrote:
joel.rees@gmail.com writes:

The question being begged is this -- Do you really want to use AVX
enough to sign the Intel agreement that you can't read unless you
agree to it before you read it so you can download it and read it?

It *is* a valid question, and I can't tell you the answer to that.
Maybe you can't answer that question yourself yet.

No, I am not willing to use Intel's proprietary tools. In an earlier message
in the same thread I said that I avoid proprietary software. That is one of
the reasons of why I use Debian: It is easy to avoid the proprietary

Might I suggest you question (in addition to the questions about
actual performance benefits) whether dealing with Intel's hardware
licensing is significantly less onerous than dealing with their
software licensing. Also whether you can actually escape the software
licensing if you start developing to their hardware.

AVX is a minefield.

Minefield in what regard?. What do you mean by "Intel's hardware licensing"?.

If you mean licensing the patents that are required to *implement* the instruction set, bear in mind that I am not doing that: I am not designing nor implementing a CPU. AFAIK, no license is required to _write a program_ that uses some instruction set. I have never heard that a compiler, assembler, or program written partially in assembly had to acquire a license.

Thanks to everybody who replied. I will probably use Bochs when time comes
to port the algorithm to AVX. Right now I am writing and perfecting it in
portable C, to have a base result against which to compare performance and


This is definitely the recommended course. Also check the time spent
in conversion with the time spent in data extraction and storing, etc.

Sure. I have already made a trivial variation of my Base64 encoding code that does no conversion, only copies the data. I use this for comparison. I also compare speed against memcpy to have an idea of what the overhead against just copying is (taking into account that the data expands during encoding, some of the input data is duplicated into the output when using memcpy to make the comparison more fair).


Reply to: