Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to run conatiner with hooks using userns=auto|nomap #1673

Open
karuboniru opened this issue Feb 15, 2025 · 2 comments
Open

Unable to run conatiner with hooks using userns=auto|nomap #1673

karuboniru opened this issue Feb 15, 2025 · 2 comments

Comments

@karuboniru
Copy link
Contributor

karuboniru commented Feb 15, 2025

$ podman run --device nvidia.com/gpu=all --rm --userns keep-id fedora:latest nvidia-smi 
Sat Feb 15 14:48:59 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 565.77                 Driver Version: 565.77         CUDA Version: 12.7     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 2060 ...    Off |   00000000:81:00.0 Off |                  N/A |
| 30%   34C    P8             15W /  175W |       2MiB /   8192MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI        PID   Type   Process name                              GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

$ podman run --device nvidia.com/gpu=all --rm --userns nomap fedora:latest nvidia-smi   
Error: OCI runtime error: crun: {"msg":"error executing hook `/usr/bin/nvidia-cdi-hook` (exit code: 1)","level":"error","time":"2025-02-15T14:49:31.621080Z"}      

After a strace, it seems the process that failed generates following log:

[pid 65025<crun>] set_robust_list(0x7f120e6daae0, 24 <unfinished ...>
[pid 65025<crun>] <... set_robust_list resumed>) = 0
[pid 65025<crun>] close_range(3, 4294967295, CLOSE_RANGE_CLOEXEC <unfinished ...>
[pid 65025<crun>] <... close_range resumed>) = 0
[pid 65025<crun>] openat(AT_FDCWD, "/dev/null", O_WRONLY|O_CLOEXEC <unfinished ...>
[pid 65025<crun>] <... openat resumed>) = 10
[pid 65025<crun>] close(9 <unfinished ...>
[pid 65025<crun>] <... close resumed>)  = 0
[pid 65025<crun>] dup2(6, 0 <unfinished ...>
[pid 65025<crun>] <... dup2 resumed>)   = 0
[pid 65025<crun>] close(6)              = 0
[pid 65025<crun>] dup2(10, 1)           = 1
[pid 65025<crun>] dup2(10, 2)           = 2
[pid 65025<crun>] close(10)             = 0
[pid 65025<crun>] chdir("/var/home/yan/.local/share/containers/storage/btrfs-containers/43b9d4c3441ea1d0556642d2104a0b4f7e68d1583d29a965895cf169bbc67449/userdata") = -1 EACCES (权限不够)
[pid 65025<crun>] exit_group(1)         = ?
[pid 65025<crun>] +++ exited with 1 +++

which indicates that exec failed even before exec'ing the hook, it looks like somewhere around

for (i = 0; i < hooks_len; i++)
{
char **env = environ;
if (hooks[i]->env)
env = hooks[i]->env;
ret = run_process_with_stdin_timeout_envp (hooks[i]->path, hooks[i]->args, cwd, hooks[i]->timeout, env,
stdin, stdin_len, out_fd, err_fd, err);
if (UNLIKELY (ret != 0))
{
if (keep_going)
libcrun_warning ("error executing hook `%s` (exit code: %d)", hooks[i]->path, ret);
else
{
libcrun_error (0, "error executing hook `%s` (exit code: %d)", hooks[i]->path, ret);
break;
}
}
}

crun/src/libcrun/utils.c

Lines 1630 to 1631 in 5ceb2a1

if (cwd && chdir (cwd) < 0)
_exit (EXIT_FAILURE);

called the chdir that failed.

run_process_with_stdin_timeout_envp -> run_process_child -> chdir -> exit
@giuseppe
Copy link
Member

what is the expected outcome here?

If the hook runs in the user namespace, then it cannot access that directory. What should crun do?

@karuboniru
Copy link
Contributor Author

So this is by design that hooks are to be executed in confined namespace and not compatible to user namespacing?

If the hook runs in the user namespace, then it cannot access that directory. What should crun do?

I was opening a issue since the error comes even before any execvpe call (and this means the behavior is indepent of the type of hook). Which means any hook will fail in this case. Is this expected?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants