You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What is your question?
Hi! If a global memory address from another GPU (both with P2P and nvlink support)on the same node return by cudaIpcGetMemHandle is passed to cutlass kernel with TMA support, does TMA overlap such inter-GPU memory copy with CUDA Core or Tensor Core? And how does TMA and nvlink finish such memory copy between different GPU?
The text was updated successfully, but these errors were encountered:
What is your question?
Hi! If a global memory address from another GPU (both with P2P and nvlink support)on the same node return by cudaIpcGetMemHandle is passed to cutlass kernel with TMA support, does TMA overlap such inter-GPU memory copy with CUDA Core or Tensor Core? And how does TMA and nvlink finish such memory copy between different GPU?
The text was updated successfully, but these errors were encountered: