That's a neat project. The README.md says: > if you don't have git: > > * seriously, no git? The question is not whether one does not have git, but whether one does not have CUDA, unfortunately. > The Design and Verification of mumax3: > > http://scitation.aip.org/content/aip/journal/adva/4/10/10.1063/1.4899186 The hyperlink seems to be paywalled or broken. You write: > A speed-up of the order of 100x compared to CPU-based simulations can > easily be reached.... Since I am unable to view the paper, would you briefly, approximately tell me how you achieved the speed-up? Alternately, would you link me to relevant presentation slides, a presentation video, or the like? Again alternately, would you advise me in which source file one should look for the core of the main loop, where the 100x speed-up is implemented? I ask because I have a simulation that improperly relies on g++'s optimizer to vectorize the simulation's main loop, the elements being 64+64 = 128-bit complex doubles. Even if my loop technique were not clumsy and 15 years outdated, the optimizer goes only to SSE hardware, and not (as far as I can tell by reviewing the disassembly) to the GPU at all. One could try OpenCL, of course; but without a good example to follow, I'd probably flounder around six months trying to figure out how to apply OpenCL intelligently.... Anyway, if you believe that your code is a good example, then I'd be interested to see how you have achieved the 100x.
Attachment:
signature.asc
Description: PGP signature