That's a neat project.
The README.md says:
if you don't have git:
* seriously, no git?
The question is not whether one does not have git, but whether one does
not have CUDA, unfortunately.
You write:
A speed-up of the order of 100x compared to CPU-based simulations can
easily be reached....
Since I am unable to view the paper, would you briefly, approximately
tell me how you achieved the speed-up? Alternately, would you link me
to relevant presentation slides, a presentation video, or the like?
Again alternately, would you advise me in which source file one should
look for the core of the main loop, where the 100x speed-up is
implemented?