Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cuda implementation of magnetic reflectivity #135

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

pkienzle
Copy link
Member

Here's a 4x slower version of magnetic reflectivity using numba.cuda.

Run using:

python run.py doc/examples/magrough/model.py --profile --steps=50 | less -S

Feel free to play and try to make it run faster.

@pkienzle
Copy link
Member Author

Note that this implementation is single precision with no stability correction. If it were fast enough then it might be worth trying to improve it. Basically, divide each matrix entry by the maximum on the diagonal and multiply the final result by that product. At least, that's how I was able to compute reflectivity from 10 km thick samples for the non-polarized case.

Modern gaming cards are 30x slower for double precision; maybe I'm accidentally promoting floats to double and killing performance that way.

@pkienzle
Copy link
Member Author

Part of the problem appears to be communication overhead with the card. Change q length from 400 to 4000 and execution time changes very little (the RTX 2080 card has 4300 processors). For 40000 q values execution time is 10x, which makes sense given the number of processors.

Changing the layer data to fill a single matrix might help with the communication overhead, but that's a more involved code modification. The new memory layout may help performance on the card, making it easier to place it into shared memory so that access patterns don't matter so much.

Making the reflectivity calculation "asynchronous" so that you can compose the next layer matrix while the current calculation is running on the card would also help.

In any case, convolution dominates when number of q increases, so that, too, needs to be moved onto the card.

@pkienzle pkienzle changed the base branch from magnetic_reflectivity_py to master May 3, 2022 20:46
@pkienzle pkienzle marked this pull request as draft May 3, 2022 20:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants