Slow Inference Speeds on Cluster GPUs #833
-
Hello everyone! Something that has seemed to happen in our lab's use of SLEAP while on the SNL cluster is that our inference speeds are around 30FPS or so on a GPU. From the methods paper (which I think had things trained on A100 GPUs?) there's reported speeds of several hundred FPS! Crazy cool speed! While we aren't using A100s (yet), I would think that 30FPS is somewhat slow. Here's the specs of an example machine people are running on:
You can even see that there's some people running sleap on it! Here's some info about the machine itself:
Since this computer only has 20 cores, so 40 with hyperthreading, maybe that could be a bottleneck for it? It also looks like the batch size of the inferences could be almost doubled since since lots of memory is available. Perhaps the most important thing to notice is that the Here's an attempt to query things and it's output:
Edit: After staring at nvidia-smi polling for a few minutes, sometimes the GPU-Util gets to 15% for a second, then drops down to zero again. Edit2: It could be that they were running an older version of SLEAP that had latency between the CPU and GPU from a bug that Arlo mentioned he fixed. It looks like on most recent versions of SLEAP the GPU is getting utilized, but only gets up to something like 50%... I could be misinterpreting things, but it looks to me that things are being loaded into the GPU but the card isn't actually doing anything for some reason! The I was asking @sheridana about this briefly today and he suggested a couple things to check:
Any advice? |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 24 replies
-
The most likely bottleneck is reading the data, especially if it's stored over the network. The CPU stuff is very unlikely -- most of the heavy lifting in SLEAP is done on the GPU, so 20 cores is way overkill. Other things to try:
|
Beta Was this translation helpful? Give feedback.
-
To simplify the info in the discussion here, two things really helped us achieve speed ups from 5-10 FPS to over 400FPS:
|
Beta Was this translation helpful? Give feedback.
To simplify the info in the discussion here, two things really helped us achieve speed ups from 5-10 FPS to over 400FPS: