Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Met error when trying this guide on FreeBSD 14.1-RELEASE #9

Open
quakelee opened this issue Jun 16, 2024 · 1 comment
Open

Met error when trying this guide on FreeBSD 14.1-RELEASE #9

quakelee opened this issue Jun 16, 2024 · 1 comment

Comments

@quakelee
Copy link

I followed your guide on my FreeBSD 14.1-RELEASE box with a 4060Ti card
the nv-sglrun nvidia-smi output likes below:
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA GeForce RTX 4060 Ti Off | 00000000:01:00.0 Off | N/A |
| 0% 39C P0 28W / 165W | 0MiB / 16380MiB | 0% Default |
| | | N/A |
+-----------------------------------------+------------------------+----------------------+

+-----------------------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=========================================================================================|
| No running processes found |
+-----------------------------------------------------------------------------------------+

but I met an error after I installed pytorch with cuda

(pytorch) [ ]$ LD_PRELOAD="${BASE_PATH}/dummy-uvm.so" python3 -c 'import torch; print(torch.cuda.get_device_name(0))'
Traceback (most recent call last):
File "", line 1, in
File "
/devs/conda/envs/pytorch-sd/lib/python3.10/site-packages/torch/cuda/init.py", line 329, in get_device_name
return get_device_properties(device).name
File "/devs/conda/envs/pytorch-sd/lib/python3.10/site-packages/torch/cuda/init.py", line 359, in get_device_properties
_lazy_init() # will define _get_device_properties
File "
/devs/conda/envs/pytorch-sd/lib/python3.10/site-packages/torch/cuda/init.py", line 217, in _lazy_init
torch._C._cuda_init()
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 304: OS call failed or operation not supported on this OS

so I assume the uvm_ioctl_override.c code is out of date on FreeBSD 14? any idea?

Thanks,

Xin

@drook
Copy link

drook commented Oct 3, 2024

Works for me on a 14.x,

(base) bash-4.2# conda activate pytorch
(pytorch) bash-4.2# LD_PRELOAD=/home/emz/sd/stable-diffusion-webui/dummy-uvm.so python3 -c 'import torch; print(torch.cuda.is_available())'
True

Though the main issue lies a bit further - something has changed in 14.x, and for some reason LD_PRELOAD makes linuxlator search for Linux native libs in the host path in some cases, notice the libdl.so.2 went missing (though it's present in the linuxlator tree):

(pytorch) bash-4.2# LD_PRELOAD=/home/emz/sd/stable-diffusion-webui/dummy-uvm.so python3 launch.py
ld-elf.so.1: Shared object "libdl.so.2" not found, required by "dummy-uvm.so"
ld-elf.so.1: Shared object "libdl.so.2" not found, required by "dummy-uvm.so"
Python 3.10.14 (main, May  6 2024, 19:42:50) [GCC 11.2.0]
Version: 1.10.1
Commit hash: <none>
Couldn't determine assets's hash: 6f7db241d2f8ba7457bac5ca9753331f0c266917, attempting autofix...
Fetching all contents for assets
ld-elf.so.1: Shared object "libdl.so.2" not found, required by "dummy-uvm.so"
Traceback (most recent call last):
  File "/compat/linux/home/emz/sd/stable-diffusion-webui/launch.py", line 49, in <module>
    main()
  File "/compat/linux/home/emz/sd/stable-diffusion-webui/launch.py", line 40, in main
    prepare_environment()
  File "/compat/linux/home/emz/sd/stable-diffusion-webui/modules/launch_utils.py", line 411, in prepare_environment
    git_clone(assets_repo, repo_dir('stable-diffusion-webui-assets'), "assets", assets_commit_hash)
  File "/compat/linux/home/emz/sd/stable-diffusion-webui/modules/launch_utils.py", line 178, in git_clone
    current_hash = run_git(dir, name, 'rev-parse HEAD', None, f"Couldn't determine {name}'s hash: {commithash}", live=False).strip()
  File "/compat/linux/home/emz/sd/stable-diffusion-webui/modules/launch_utils.py", line 166, in run_git
    git_fix_workspace(dir, name)
  File "/compat/linux/home/emz/sd/stable-diffusion-webui/modules/launch_utils.py", line 153, in git_fix_workspace
    run(f'"{git}" -C "{dir}" fetch --refetch --no-auto-gc', f"Fetching all contents for {name}", f"Couldn't fetch {name}", live=True)
  File "/compat/linux/home/emz/sd/stable-diffusion-webui/modules/launch_utils.py", line 116, in run
    raise RuntimeError("\n".join(error_bits))
RuntimeError: Couldn't fetch assets.
Command: "git" -C "/compat/linux/home/emz/sd/stable-diffusion-webui/repositories/stable-diffusion-webui-assets" fetch --refetch --no-auto-gc
Error code: 1
(pytorch) bash-4.2#

How the hell the checking call still work I have absolutely no idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants