-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Performance of the atomic neural networks in TorchANI #11
Comments
Answering to #5 (comment)
The implementation of the atomic NN isn't optimal in TorchANI. For example, ANI-2x has 8 sets of atomic NNs, each NN set has 7 atomic NNs (for each element), each atomic NN is 3 layer fully-connected NN. The NN are computed sequentially, so a matrix multiplication kernel is executed 168 times (= 8 * 7 * 3) just in the forward pass. Using a batched matrix multiplication, it should be possible to reduce to 3 kernel executions. After finishing #5, I'll try to make a batched PyTorch implementation. Ultimately, TensorRT should to be very good at that. |
@peastman, just letting know before you start writing the NN part in CUDA directly. I almost have a working implementation of the NN part using the batched matrix multiplications. I still have to fix a bug or two, but I can see a significant performance gain. I'll share the benchmarks soon. |
Thanks! Looking forward to seeing it. |
Dear @raimis these results look amazing! Just a note that 1x/2x hyperparameter optimization was done only with respect to the accuracy. We would be much looking for performance considerations and other constraints for the next iteration. Even current models could be re-trained and re-fitted if necessary. |
@isayev In case of ANI-2x, for small molecules (~100 atoms), the bottleneck is the matrix multiplications in the dense layers. So, a single-model NNP (rather than the ensemble) would improve speed. For bigger molecules, the bottleneck becomes the neighbour search for the symmetry functions. |
End-to-end performance benchmarks of ANI-2x
Molecule: 46 atoms (
pytorch/molecules/2iuz_ligand.mol2
)GPU: GTX 1080 Ti
Forward & backward passes with complete ANI-2x:
Just forward pass with complete ANI-2x:
Forward & backward passes with ANI-2x using just one set of the atomic NNs, not 8:
Just forward pass with ANI-2x using just one set of the atomic NNs, not 8:
Originally posted by @raimis in #5 (comment)
The text was updated successfully, but these errors were encountered: