Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Matmul test failure #78

Open
shiwenloong opened this issue May 16, 2024 · 2 comments
Open

Matmul test failure #78

shiwenloong opened this issue May 16, 2024 · 2 comments

Comments

@shiwenloong
Copy link

shiwenloong commented May 16, 2024

I encountered a test failure after building and running the tests. Here are the details:

  • GPU: RTX 4090
  • Repo branch: v1.4.0
  • Operating System: Ubuntu 22.04.3
  • CUDA version: 12.2
  • cuDNN version: 8.9.7
  • g++version: 11.4.0

I followed the build instructions as provided in the README:

mkdir build
cd build
cmake ..
make -j8

Output is:

-- The C compiler identification is GNU 11.4.0
-- The CXX compiler identification is GNU 11.4.0
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found CUDAToolkit: /usr/local/cuda-12.2/targets/x86_64-linux/include (found version "12.2.140")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- Performing Test HAVE_FLAG__ffile_prefix_map__nvme2_medsam_cuda_mode_cudnn_frontend_build__deps_catch2_src__
-- Performing Test HAVE_FLAG__ffile_prefix_map__nvme2_medsam_cuda_mode_cudnn_frontend_build__deps_catch2_src__ - Success
-- cudnn found at /usr/local/cuda-12.2/lib64/libcudnn.so.
-- Found LIBRARY: /usr/local/cuda-12.2/include
-- cuDNN: /usr/local/cuda-12.2/lib64/libcudnn.so
-- cuDNN: /usr/local/cuda-12.2/include
-- cudnn_adv_infer found at /usr/local/cuda-12.2/lib64/libcudnn_adv_infer.so.
-- cudnn_adv_train found at /usr/local/cuda-12.2/lib64/libcudnn_adv_train.so.
-- cudnn_cnn_infer found at /usr/local/cuda-12.2/lib64/libcudnn_cnn_infer.so.
-- cudnn_cnn_train found at /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.
-- cudnn_ops_infer found at /usr/local/cuda-12.2/lib64/libcudnn_ops_infer.so.
-- cudnn_ops_train found at /usr/local/cuda-12.2/lib64/libcudnn_ops_train.so.
-- cudnn found at /usr/local/cuda-12.2/lib64/libcudnn.so.
-- cuDNN: /usr/local/cuda-12.2/lib64/libcudnn.so
-- cuDNN: /usr/local/cuda-12.2/include
-- cudnn_adv_infer found at /usr/local/cuda-12.2/lib64/libcudnn_adv_infer.so.
-- cudnn_adv_train found at /usr/local/cuda-12.2/lib64/libcudnn_adv_train.so.
-- cudnn_cnn_infer found at /usr/local/cuda-12.2/lib64/libcudnn_cnn_infer.so.
-- cudnn_cnn_train found at /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.
-- cudnn_ops_infer found at /usr/local/cuda-12.2/lib64/libcudnn_ops_infer.so.
-- cudnn_ops_train found at /usr/local/cuda-12.2/lib64/libcudnn_ops_train.so.
-- Configuring done (6.0s)
-- Generating done (0.0s)
-- Build files have been written to: /nvme2/medsam/cuda-mode/cudnn-frontend/build
[100%] Linking CXX executable ../bin/samples
Warning: Unused direct dependencies:
        /usr/local/cuda-12.2/lib64/libnvrtc.so.12
        /usr/local/cuda-12.2/lib64/libnvrtc-builtins.so.12.2
        /lib/x86_64-linux-gnu/libcuda.so.1
        /usr/local/cuda-12.2/lib64/libnvJitLink.so.12
        /usr/local/cuda-12.2/lib64/libcudnn_adv_train.so.8
        /usr/local/cuda-12.2/lib64/libcudnn_ops_train.so.8
        /usr/local/cuda-12.2/lib64/libcudnn_cnn_train.so.8
        /usr/local/cuda-12.2/lib64/libcudnn_adv_infer.so.8
        /usr/local/cuda-12.2/lib64/libcudnn_cnn_infer.so.8
        /usr/local/cuda-12.2/lib64/libcudnn_ops_infer.so.8
[100%] Built target samples

Then I run the matmul test

CUDNN_FRONTEND_LOG_FILE=stdout CUDNN_FRONTEND_LOG_INFO=1 ./build/bin/samples MatMul

Output is:

Filters: "MatMul"
Randomness seeded to: 1045110732
[cudnn_frontend] INFO: Validating matmul node GEMM...
[cudnn_frontend] INFO: Inferrencing properties for matmul node GEMM...
[cudnn_frontend] INFO: Creating cudnn tensors for node named 'GEMM':
[cudnn_frontend] INFO: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: ["BFLOAT16"] Id: 2 nDims 3 VectorCount: 1 vectorDimension -1 Dim [ 16,32,128 ] Str [ 4096,128,1 ] isVirtual: 0 isByValue: 0 Alignment: 16 reorder_type: ["NONE"]
[cudnn_frontend] INFO: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: ["BFLOAT16"] Id: 3 nDims 3 VectorCount: 1 vectorDimension -1 Dim [ 16,128,64 ] Str [ 8192,64,1 ] isVirtual: 0 isByValue: 0 Alignment: 16 reorder_type: ["NONE"]
[cudnn_frontend] INFO: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: ["FLOAT"] Id: 4 nDims 3 VectorCount: 1 vectorDimension -1 Dim [ 16,32,64 ] Str [ 2048,64,1 ] isVirtual: 0 isByValue: 0 Alignment: 16 reorder_type: ["NONE"]
[cudnn_frontend] INFO: Building MatmulNode operations GEMM...
[cudnn_frontend] CUDNN_BACKEND_MATMUL_DESCRIPTOR : Math precision ["FLOAT"]
[cudnn_frontend] CUDNN_BACKEND_OPERATIONGRAPH_DESCRIPTOR has 1operations.
Tag: Matmul_

[cudnn_frontend] INFO:  Getting plan from heuristics for Matmul_ ...
[cudnn_frontend] CUDNN_BACKEND_ENGINEHEUR_DESCRIPTOR :
Heuristic Mode 3 has 6 configurations 
[cudnn_frontend] INFO: get_heuristics_list statuses: CUDNN_STATUS_SUCCESS 
[cudnn_frontend] INFO: config list has 6 configurations.
[cudnn_frontend] INFO: config list has 6 good configurations.
[cudnn_frontend] INFO: Extracting engine configs.
[cudnn_frontend] INFO: Querying engine config properties
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudnn_frontend] INFO: Building plan at index 0 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudnn_frontend] INFO: Building plan at index 1 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudnn_frontend] INFO: Building plan at index 2 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudnn_frontend] INFO: Building plan at index 3 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudnn_frontend] INFO: Building plan at index 4 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudnn_frontend] INFO: Building plan at index 5 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: plans.check_support(h) at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/graph_interface.h:260

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
samples is a Catch2 v3.3.2 host application.
Run with -? for options

-------------------------------------------------------------------------------
Matmul
-------------------------------------------------------------------------------
/nvme2/medsam/cuda-mode/cudnn-frontend/samples/cpp/matmuls.cpp:31
...............................................................................

/nvme2/medsam/cuda-mode/cudnn-frontend/samples/cpp/matmuls.cpp:80: FAILED:
  REQUIRE( graph.check_support(handle).is_good() )
with expansion:
  false

===============================================================================
test cases:  1 |  0 passed | 1 failed
assertions: 11 | 10 passed | 1 failed
@shiwenloong shiwenloong changed the title Matmul test Failure Matmul test failure May 16, 2024
@Anerudhan
Copy link
Collaborator

Hi @shiwenloong ,

Thanks for reporting this issue.

I have added an experimental branch issues/75_and_78 to print the cudaGetLastError().

Please run,
CUDNN_LOGLEVEL_DBG=3 CUDNN_LOGDEST_DBG=backend_api.log CUDNN_FRONTEND_LOG_FILE=stdout CUDNN_FRONTEND_LOG_INFO=1 ./build/bin/samples "MatMul" and please attach both backend_api.log and frontend log for us to help debug.

Thanks

@shiwenloong
Copy link
Author

shiwenloong commented May 17, 2024

I run CUDNN_LOGLEVEL_DBG=3 CUDNN_LOGDEST_DBG=backend_api.log CUDNN_FRONTEND_LOG_FILE=stdout CUDNN_FRONTEND_LOG_INFO=1 ./build/bin/samples "MatMul"
But I can't find the backend_api.log file. This is the frontend log:

Filters: "MatMul"
Randomness seeded to: 3739293787
[cudnn_frontend] INFO: Validating matmul node GEMM...
[cudnn_frontend] INFO: Inferrencing properties for matmul node GEMM...
[cudnn_frontend] INFO: Creating cudnn tensors for node named 'GEMM':
[cudnn_frontend] INFO: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: ["BFLOAT16"] Id: 2 nDims 3 VectorCount: 1 vectorDimension -1 Dim [ 16,32,128 ] Str [ 4096,128,1 ] isVirtual: 0 isByValue: 0 Alignment: 16 reorder_type: ["NONE"]
[cudnn_frontend] INFO: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: ["BFLOAT16"] Id: 3 nDims 3 VectorCount: 1 vectorDimension -1 Dim [ 16,128,64 ] Str [ 8192,64,1 ] isVirtual: 0 isByValue: 0 Alignment: 16 reorder_type: ["NONE"]
[cudnn_frontend] INFO: CUDNN_BACKEND_TENSOR_DESCRIPTOR : Datatype: ["FLOAT"] Id: 4 nDims 3 VectorCount: 1 vectorDimension -1 Dim [ 16,32,64 ] Str [ 2048,64,1 ] isVirtual: 0 isByValue: 0 Alignment: 16 reorder_type: ["NONE"]
[cudnn_frontend] INFO: Building MatmulNode operations GEMM...
[cudnn_frontend] CUDNN_BACKEND_MATMUL_DESCRIPTOR : Math precision ["FLOAT"]
[cudnn_frontend] CUDNN_BACKEND_OPERATIONGRAPH_DESCRIPTOR has 1operations.
Tag: Matmul_

[cudnn_frontend] INFO:  Getting plan from heuristics for Matmul_ ...
[cudnn_frontend] CUDNN_BACKEND_ENGINEHEUR_DESCRIPTOR :
Heuristic Mode 3 has 6 configurations 
[cudnn_frontend] INFO: get_heuristics_list statuses: CUDNN_STATUS_SUCCESS 
[cudnn_frontend] INFO: config list has 6 configurations.
[cudnn_frontend] INFO: config list has 6 good configurations.
[cudnn_frontend] INFO: Extracting engine configs.
[cudnn_frontend] INFO: Querying engine config properties
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudaGetLastError] ERROR: 0
[cudnn_frontend] INFO: Building plan at index 0 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudaGetLastError] ERROR: 0
[cudnn_frontend] INFO: Building plan at index 1 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudaGetLastError] ERROR: 0
[cudnn_frontend] INFO: Building plan at index 2 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudaGetLastError] ERROR: 0
[cudnn_frontend] INFO: Building plan at index 3 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudaGetLastError] ERROR: 0
[cudnn_frontend] INFO: Building plan at index 4 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED. ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] because plan building failed at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/plans.h:179
[cudaGetLastError] ERROR: 0
[cudnn_frontend] INFO: Building plan at index 5 gave ["GRAPH_EXECUTION_PLAN_CREATION_FAILED"] with message: CUDNN_BACKEND_EXECUTION_PLAN_DESCRIPTOR: cudnnFinalize Descriptor Failed cudnn_status: CUDNN_STATUS_EXECUTION_FAILED
[cudnn_frontend] ERROR: plans.check_support(h) at /nvme2/medsam/cuda-mode/cudnn-frontend/include/cudnn_frontend/graph_interface.h:260

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
samples is a Catch2 v3.3.2 host application.
Run with -? for options

-------------------------------------------------------------------------------
Matmul
-------------------------------------------------------------------------------
/nvme2/medsam/cuda-mode/cudnn-frontend/samples/cpp/matmuls.cpp:31
...............................................................................

/nvme2/medsam/cuda-mode/cudnn-frontend/samples/cpp/matmuls.cpp:80: FAILED:
  REQUIRE( graph.check_support(handle).is_good() )
with expansion:
  false

===============================================================================
test cases:  1 |  0 passed | 1 failed
assertions: 11 | 10 passed | 1 failed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants