[QST] Does TMA overlap memory copy from/to global memory address from another GPU return by cudaIpcGetMemHandle? #1943

umiswing · 2024-11-15T00:13:27Z

What is your question?
Hi! If a global memory address from another GPU (both with P2P and nvlink support)on the same node return by cudaIpcGetMemHandle is passed to cutlass kernel with TMA support, does TMA overlap such inter-GPU memory copy with CUDA Core or Tensor Core? And how does TMA and nvlink finish such memory copy between different GPU?

thakkarV · 2024-11-15T00:30:46Z

TMA knows nothing about the memory being on remote or local GPU if you have set up a coherent NVLink address space across the system

umiswing added ? - Needs Triage question Question labels Nov 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QST] Does TMA overlap memory copy from/to global memory address from another GPU return by cudaIpcGetMemHandle? #1943

[QST] Does TMA overlap memory copy from/to global memory address from another GPU return by cudaIpcGetMemHandle? #1943

umiswing commented Nov 15, 2024

thakkarV commented Nov 15, 2024 •

edited

Loading

[QST] Does TMA overlap memory copy from/to global memory address from another GPU return by cudaIpcGetMemHandle? #1943

[QST] Does TMA overlap memory copy from/to global memory address from another GPU return by cudaIpcGetMemHandle? #1943

Comments

umiswing commented Nov 15, 2024

thakkarV commented Nov 15, 2024 • edited Loading

thakkarV commented Nov 15, 2024 •

edited

Loading