Done and pushed. it took me quite a while to figure this out. Two questions though:
1) We are fundamentally doing this because we want to build this for all possible compilation flags right? - and in the dispatch script we are trying to figure out the most efficient implementation and use the binary for the same?
2) Could you please review the changes once? Works fine everywhere IMO
BTW I noticed that the rules file which was initially just around 10 lines increased by approximately an order of magnitude because I had to almost handcraft the makefile to do various stuff, and I observed a similar phenomenon for kalign and mmseqs2 where you made similar changes.
My intent here is to say - I (vaguely)wonder can we make a tool/helper that can _automate_ the entire process while using simde to reduce handcrafted stuff?
Because fundamentally we seem to be making similar changes everywhere.
And ofcourse, apologies if this doesn't sound good.
I'll definitely update this in the next couple of days.
Kind Regards,
Nilesh