Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pod和容器内跑的gpu利用率监控,没有获取到;还是只能获取整张卡的利用率 #2

Open
hiahia121 opened this issue Jan 3, 2024 · 0 comments

Comments

@hiahia121
Copy link

[root@host-135 tengxun_gpushare]# curl localhost:8080/metrics |grep -E "MEM_COPY_UTIL|GPU_UTIL"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0Handling connection for 8080
100 11121 0 11121 0 0 2172k 0 --# HELP DCGM_FI_DEV_GPU_UTIL GPU utilization (in %).
:--# TYPE DCGM_FI_DEV_GPU_UTIL gauge
:-- --:--:-- --:--:-- 2715k

HELP DCGM_FI_DEV_MEM_COPY_UTIL Memory utilization (in %).

TYPE DCGM_FI_DEV_MEM_COPY_UTIL gauge

DCGM_FI_DEV_GPU_UTIL{gpu="0",UUID="GPU-ffa237f3-600a-995c-ad66-30338c513cf6",device="nvidia0",container="",namespace="",pod=""} 34
DCGM_FI_DEV_MEM_COPY_UTIL{gpu="0",UUID="GPU-ffa237f3-600a-995c-ad66-30338c513cf6",device="nvidia0",container="",namespace="",pod=""} 28
DCGM_FI_DEV_GPU_UTIL{gpu="1",UUID="GPU-4ee42e63-b278-9153-3af9-342af0b17830",device="nvidia1",container="",namespace="",pod=""} 0
DCGM_FI_DEV_MEM_COPY_UTIL{gpu="1",UUID="GPU-4ee42e63-b278-9153-3af9-342af0b17830",device="nvidia1",container="",namespace="",pod=""} 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant