CUDNN_STATUS_MAPPING_ERROR when running directly after TensorRT #116

ghost · 2021-01-31T01:18:51Z

How can I run kmcuda synchronously after a tensorRT model performs inference on the same GPU (in a loop)?

For instance, I already am allocating pagelocked buffers for my tensorRT model, but I don't explicitly allocate anything upfront for kmeans_cuda to run on. Doesn't that mean there might be a conflict if both processes are accessing the GPU and don't totally "cleanup" after themselves?

The error I get the next time tensorRT runs (only after kmcuda runs):

[TensorRT] ERROR: ../rtSafe/safeContext.cpp (133) - Cudnn Error in configure: 7 (CUDNN_STATUS_MAPPING_ERROR)
[TensorRT] ERROR: FAILED_EXECUTION: std::exception
reported here: NVIDIA/TensorRT#303

So I guess in general my question is how should/can I cleanup after kmcuda runs? The reason I think some how preallocating buffers would help is because a very similar SO issue reported that as the solution (for tensorflow and tensorRT on the same GPU)

Environment:

nvcr.io/nvidia/l4t-base:r32.4.4
cuda-10.2
tensorRT 7.1.3

The text was updated successfully, but these errors were encountered:

ghost · 2021-01-31T05:05:31Z

What I do know is this problem can be solved with isolating tensorRT from kmeans_cuda.

Here's how I've hackily fixed it:
I simply run the tensorRT inference (with all it's pagelocked allocation, engine, stream, context, etc.) in one thread and then run kmeans_cuda in a separate thread. A thread-safe queue is used to pass the inference results through to the other thread that runs kmeans. There - isolation! No more errors.

But I have no idea why this works, and it feels extremely hacky. Are devs willing to comment on best practices and caveats to running kmeans_cuda synchronously with other calls to the GPU (using tensorRT or otherwise)?

futureisatyourhand · 2022-09-30T03:19:04Z

I also encountered the same problem, but I loaded two trt models at the same time.
My method is:
first, mapping the torch2trt inc and lib paths to the include and lib paths corresponding to tensorrt（e.g, TensorRT 8.2.3,TensorRT 7.1）；
then, separating the two trt models initializing with two classes，respectively.
Finally, I use one class to use two model class respectively .
Note: when you execute every forward or call，you need add the code（torch.cuda.set_device('cuda:0')）.
My problem is solved, and the stress test also passed.
My method is succeed for these environments(TensorRT 7.1.2 and torch2trt 0.3.0, TensorRT 8.2.3 and torch2trt-0.4.0).

I think these is a resource and GPU contention issue.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CUDNN_STATUS_MAPPING_ERROR when running directly after TensorRT #116

CUDNN_STATUS_MAPPING_ERROR when running directly after TensorRT #116

ghost commented Jan 31, 2021 •

edited by ghost

Loading

ghost commented Jan 31, 2021

futureisatyourhand commented Sep 30, 2022 •

edited

Loading

CUDNN_STATUS_MAPPING_ERROR when running directly after TensorRT #116

CUDNN_STATUS_MAPPING_ERROR when running directly after TensorRT #116

Comments

ghost commented Jan 31, 2021 • edited by ghost Loading

ghost commented Jan 31, 2021

futureisatyourhand commented Sep 30, 2022 • edited Loading

ghost commented Jan 31, 2021 •

edited by ghost

Loading

futureisatyourhand commented Sep 30, 2022 •

edited

Loading