You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
How can I run kmcuda synchronously after a tensorRT model performs inference on the same GPU (in a loop)?
For instance, I already am allocating pagelocked buffers for my tensorRT model, but I don't explicitly allocate anything upfront for kmeans_cuda to run on. Doesn't that mean there might be a conflict if both processes are accessing the GPU and don't totally "cleanup" after themselves?
The error I get the next time tensorRT runs (only after kmcuda runs):
So I guess in general my question is how should/can I cleanup after kmcuda runs? The reason I think some how preallocating buffers would help is because a very similar SO issue reported that as the solution (for tensorflow and tensorRT on the same GPU)
What I do know is this problem can be solved with isolating tensorRT from kmeans_cuda.
Here's how I've hackily fixed it:
I simply run the tensorRT inference (with all it's pagelocked allocation, engine, stream, context, etc.) in one thread and then run kmeans_cuda in a separate thread. A thread-safe queue is used to pass the inference results through to the other thread that runs kmeans. There - isolation! No more errors.
But I have no idea why this works, and it feels extremely hacky. Are devs willing to comment on best practices and caveats to running kmeans_cuda synchronously with other calls to the GPU (using tensorRT or otherwise)?
I also encountered the same problem, but I loaded two trt models at the same time.
My method is:
first, mapping the torch2trt inc and lib paths to the include and lib paths corresponding to tensorrt(e.g, TensorRT 8.2.3,TensorRT 7.1);
then, separating the two trt models initializing with two classes,respectively.
Finally, I use one class to use two model class respectively .
Note: when you execute every forward or call,you need add the code(torch.cuda.set_device('cuda:0')).
My problem is solved, and the stress test also passed.
My method is succeed for these environments(TensorRT 7.1.2 and torch2trt 0.3.0, TensorRT 8.2.3 and torch2trt-0.4.0).
I think these is a resource and GPU contention issue.
How can I run kmcuda synchronously after a tensorRT model performs inference on the same GPU (in a loop)?
For instance, I already am allocating pagelocked buffers for my tensorRT model, but I don't explicitly allocate anything upfront for
kmeans_cuda
to run on. Doesn't that mean there might be a conflict if both processes are accessing the GPU and don't totally "cleanup" after themselves?The error I get the next time tensorRT runs (only after kmcuda runs):
So I guess in general my question is how should/can I cleanup after kmcuda runs? The reason I think some how preallocating buffers would help is because a very similar SO issue reported that as the solution (for tensorflow and tensorRT on the same GPU)
Environment:
The text was updated successfully, but these errors were encountered: