-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Impossible to install it on Windows with PIP and venv #2
Comments
And after that I have tried to install : pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu121 I think it's something to deal with deepspeed but I'm unsure. thanks |
It's a DeepSpeed issue. I spent hours trying to get it to work. DeepSpeed simply won't build on Windows, and Microsoft is aware of this. They're working on getting a wheel built but as of 1 week ago, they're "still working on it." Meanwhile, this guy did get DeepSpeed to build and created a GUI to facilitate the install. I can confirm that I was able to get DeepSpeed installed using this and was able to run the ds_report, so I've confirmed that deepspeed is running on my Windows 10 system. Unfortunately, however, when you try to install Llasa-tts, it runs through its own requirements list, and in that list is xcodec2. Xcodec2 is a pip package, not a github source that we can edit. Maybe someone smarter than me can shed light here, but from what I can tell, there's no way to alter the pip package content for XCodec2. Because of this, it will hit the deepspeed 0.15.1 package and then fail. Every damn time. Tomorrow I'm going to look into WSL or possibly running Linux in docker and see if that works. |
As a follow up to this, I did manage to download the tar file for Xcodec2. I looked at the install files. It's calling for deepspeed 15.1. I then set up a new conda environment, installed python 3.9 (since this is what the Llasa / Xcodec2 site gave in its example), installed pytorch + cuda, then installed deepspeed 15.1. It built and installed fine, and I verified with ds_report. I then went back to Xcodec2 and tried to install it. As hoped, it went through the install, found that deepspeed was installed, and bypassed that, proceeding to install the other requirements. Unfortunately there are several other install issues with various dependencies requiring different versions of python. Some require 3.9. Here's the error: ERROR: Ignored the following versions that require a different python version: 0.1.0 Requires-Python ==3.9.; 0.1.1 Requires-Python ==3.9.; 0.36.0 Requires-Python >=3.6,<3.10; 0.37.0 Requires-Python >=3.7,<3.10; 0.52.0 Requires-Python >=3.6,<3.9; 0.52.0rc3 Requires-Python >=3.6,<3.9; 0.53.0 Requires-Python >=3.6,<3.10; 0.53.0rc1.post1 Requires-Python >=3.6,<3.10; 0.53.0rc2 Requires-Python >=3.6,<3.10; 0.53.0rc3 Requires-Python >=3.6,<3.10; 0.53.1 Requires-Python >=3.6,<3.10; 0.54.0 Requires-Python >=3.7,<3.10; 0.54.0rc2 Requires-Python >=3.7,<3.10; 0.54.0rc3 Requires-Python >=3.7,<3.10; 0.54.1 Requires-Python >=3.7,<3.10 I tried installing python 3.10 into the conda environment and re-ran the llasa install script, but it still spit out the same error. I tried installing various dependencies individually and that failed. I consulted ChatGPT and it said that the likely issue is that some of these dependencies don't work on Windows, and to use WSL or Linux. So, in summary, there's no direct and easy path to get this working in Windows. |
Yeah I'm pretty sure it needs wsl/Linux. What I like to do is get a virtual machine entirely. With oracle virtual box. And install Linux from scratch there. If it's too much of a hassle just use the colab |
So I did get this to run. I had to use WSL, which wasn't really that hard. It's a shame I have to run Linux to use this tool, but the fact that I can run Linux right inside windows and don't have to dual boot makes this a bit better. And the fact that this installed SO easily was really nice. I also have access to the Linux files in Windows so I can copy/paste files between the two. And I can mount my windows drives, so if need be, I can just access any file from Linux. Here's a rundown for anyone who really wants to get this running on Windows. Please note that I have virtually no knowledge of Linux, so if I can do this, you can too. And it's way faster to do this than to bang your head against the wall for 3 hours trying to get deepspeed installed. For reference, I have a Windows 10 machine with lots of RAM and a 4090. I started by following along with this video, which I found very helpful: I made sure virtualization was turned on in my BIOS. (It was.) I also made sure my Windows install met the minimum version requirements (It did.) I opened an x64 Native Tools Command Prompt for VS 2022 in Administrator mode and then ran the install command to install WSL. That command is: wsl --install The WSL install was very quick. It installs a stripped down version of Ubuntu with no GUI. What's nice, however, is that it adds this install as an app to your Start menu. So rather than having to launch this via a command line, just use the icon and bam! You're running Linux. I'm a Windows user, so GUI's are sort of my comfort level. The video shows how to install the Kali linux distribution (which apparently is good for hacking and running code). I installed the extras for Kali described in the video. And then I installed Kex, which allows for the full linux GUI. You'll find that around the 14:20 mark in the video. Kex took about 15 minutes to install, but once it was up and running, I had a full Linux desktop running in a window on my Windows desktop. This whole process took about 20 to 30 minutes, which is far less time than it took me to figure out how to get DeepSpeed installed and built on Windows, so I was pretty pleased. Now I have a Linux install with a GUI, which is great, but I have no idea how to get a Github project installed, as I've only installed github projects with Windows using the VS2022 command prompt. I opened the file explorer in Linux from the Kex desktop and went to the downloads folder. I then right clicked and opened a Terminal window there. From there, I asked ChatGPT what to do. It said I should update the system by running each of the following commands one after the other. You can copy these in Windows, then right click in the Terminal in Linux and paste them in. sudo apt update && sudo apt upgrade -y Once this was done, it was time to clone the Llasa project. This was pretty straightforward and used the same commands I'm familiar with in Windows. git clone https://github.com/nivibilla/local-llasa-tts.git I normally use a conda environment in Windows. Apparently in Linux you can do the same basic thing with virtual environments. Here's the code I ran: python3 -m venv venv This is a little confusing, because there are two "venvs" called for there. Basically the second "venv" is the name of your environment. A better and more descriptive use would be: python3 -m venv llasa_env But I didn't find that out until after I ran the commands, so use whichever makes more sense to you. Once this was in place, I ran the install commands from the Local Llasa page: pip install -r ./requirements_native_hf.txt This is where it began to feel like magic. Rather than having the install fail when running the pip commands, which it so often does in Windows, it just breezed along. When it was finished, I ran the python ./hf_app.py script. It downloaded the models (which took about 10 minutes) and then spit out a URL for the gradio app. CTRL+Click on the http address launches the web browser and BAM! You're now using the gradio app. I tested it out and it works flawlessly. So while I hate that I have to run Linux, I really like that I can now run this locally. I normally create a .bat file in Windows so I can just double click one file and have it launch. To do that here, you need to create a script file (a .sh file). Then you need to make it runnable. And optionally make a desktop icon. I'd recommend this, as it's easy to forget to enter the environment or the exact command needed to run this. Here's how to do it. First, install the gnome-terminal. Why? I have no idea. Linux already has a terminal, but when ChatGPT gave me the code, it called for a terminal and it failed to run and I eventually had to install gnome terminal, so run this code: sudo apt install -y gnome-terminal Then, with the terminal open, make sure you're in the local-llasas-tts directory (you can go to your directory, right click and open a terminal). Then run this command to create a new text file: nano run_llasa.sh The file will be created and you'll have the option to write in the code. Paste this in: #!/bin/bash Navigate to the project directorycd ~/Downloads/local-llasa-tts Launch a new terminal window (optional for GUI)gnome-terminal -- bash -c "
" To save this file, press Ctrl+O. It will ask to confirm the name. Just press Enter. Then to exit the script editing, press Ctrl+X Now you should be back at the terminal. If you look in the file explorer, you should see the run_llasa.sh file. You need to make this executable. Back in the terminal, paste in this code: chmod +x run_llasa.sh Now close the terminal. To see if this works, you will need to make sure that the file explorer is configured to run scripts. Go th the Edit>Preferences in your file explorer and under Advanced settings, make sure you enable script execution. Now you should be able to double click the run_llasa.sh file. It should launch a terminal, activate the virtual environment and call the main python app, which will give you the http address you can Ctrl+Click on to lauch the gradio app in Firefox (which comes with the linux install). I know this seems like a royal pain in the ass, and yes, to some degree it is. But trust me, the fact that this works is pretty spectacular. It opens the door for using all sorts of Github projects that won't run in Windows easily. Hope this helps! |
@realstevewarner thank you for the incredibly detailed tutorial. Would you like to PR it as a windows_setup.readme so others can see it? Also maybe add a link to it in the main readme |
Hey, thanks so much! I'd be happy to do that. Thanks for the suggestion. |
Thanks for the detailed tutorial May i know the VRAM requirement? I have a 3060 12gb VRAM. Can this fit into my GPU? |
from my tests MASKGCT has better quality, this one degrades audio a lot |
Wow that is very good, thanks for sharing will try it out |
Thanks for the heads up. I tried out the demo on huggingface. It took over 2 hours to process the request, and the results weren't any better than Llasa. At best they were on par, but the inflections (meaning how natural did the read sound) were worse. I figured it could be the seed though, so I went ahead and installed it so I could run a few tries and see how well it really performs. I tried for over 2 hours to get it running on Linux. No dice. The instructions seemed clear enough. Clone the repo, set a conda environment, then install the requirements. I got all that done and got the Gradio demo running, but downloading the Whisper model failed repeatedly. I finally ended up finding the download link in the initi.py file, manually downloaded it and put it in the right folder. That resolved one issue, but there were about a dozen other issues including CUDA library issues, Onnx install issues, etc. I ended up spending about 3 hours with ChatGPT tying to get all of the issues resolved. But I finally got it running. I used the same voice sample I had with Llasa and fed the same text into MaskGCT that I used with Llasa. Using the default settings, the inflections on the resulting read from MaskGCT are objectively WORSE than Llasa. However, by increasing the Number of Timesteps, the quality does improve. I found that a Timestep value of 75 yielded acceptable results in most cases. It's worth noting that MaskGCT uses significantly more memory than Llasa, coming in at around 15gb. Also, it's worth noting that the quality of MaskGCT never achieved a good read when using this sample pulled from the news. MaskGCT perpetually pronounced "Biden-era" as "bidenera" while Llasa correctly pronounced this as "Biden Era." I ran this through numerous inferences and MaskGCT failed every time. So, what I can say is this. MaskGCT is a massive pain in the ass to install. It eats up more RAM than Llasa. It's slower to generate than Llasa. It can produce comparable quality to Llasa, but on more complex text, it produces inferior results to Llasa. So if you're reading this and wondering if you should be using MaskGCT over Llasa, the answer is a resounding NO. Feel free to go through the effort though. YMMV. If you do feel like the Llasa output simply isn't good enough, I'd recommend installing Resemble Enhance. It can take your F5-TTS, E2-TTS and Llasa-TTS output and make it sound "better." How much? It depends on how bad your source is, but I found that my E2-TTS output (which sounds bad, but reads more "natural" than F5-TTS) came out notably better. Llasa output run through Resemble-Enhance produced less striking results, but to my ears, the Resemble Enhance audio was maybe 10 to 15% better. Less hollow sounding and a little more robust. It's not something I'd feel the need to use every time as I think he Llasa output is quite good on its own, but it's there if you need it. https://github.com/resemble-ai/resemble-enhance Something to note here is that there is a Resemble Enhance version for Windows, but as with Llasa, it relies on DeepSpeed, which fails to build. So if you've gone through the trouble of setting up WSL and getting Linux installed, Resemble Enhance will install without issues. |
Llasa 8b should be coming shortly |
Hi @realstevewarner - the DeepSpeed whls are published for python 3.10, 3.11, and 3.12 are published on PyPI now for DeepSpeed 0.16.3 |
Thanks fantastic! Thank you! I'll try to build on Windows sometime in the next day or two. |
I have tried everything to install it on windows with pip and venv (I don't have conda) , And keep have issues.
May we have a step by step for Windows user please ?
Thanks a lot
(venv) D:PATH>pip install -r ./requirements_native_hf.txt
Collecting gradio (from -r ./requirements_native_hf.txt (line 1))
Using cached gradio-5.13.1-py3-none-any.whl.metadata (16 kB)
Collecting xcodec2==0.1.3 (from -r ./requirements_native_hf.txt (line 2))
Using cached xcodec2-0.1.3-py3-none-any.whl.metadata (4.8 kB)
Collecting bitsandbytes>=0.39.0 (from -r ./requirements_native_hf.txt (line 3))
Using cached bitsandbytes-0.45.1-py3-none-win_amd64.whl.metadata (5.9 kB)
Collecting accelerate==1.1.0 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached accelerate-1.1.0-py3-none-any.whl.metadata (19 kB)
Collecting aiohappyeyeballs==2.4.0 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached aiohappyeyeballs-2.4.0-py3-none-any.whl.metadata (5.9 kB)
Collecting aiohttp==3.10.5 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached aiohttp-3.10.5-cp310-cp310-win_amd64.whl.metadata (7.8 kB)
Collecting aiosignal==1.3.1 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached aiosignal-1.3.1-py3-none-any.whl.metadata (4.0 kB)
Collecting annotated-types==0.7.0 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached annotated_types-0.7.0-py3-none-any.whl.metadata (15 kB)
Collecting antlr4-python3-runtime==4.9.3 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached antlr4_python3_runtime-4.9.3-py3-none-any.whl
Collecting async-timeout==4.0.3 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached async_timeout-4.0.3-py3-none-any.whl.metadata (4.2 kB)
Collecting attrs==24.2.0 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached attrs-24.2.0-py3-none-any.whl.metadata (11 kB)
Collecting audioread==3.0.1 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached audioread-3.0.1-py3-none-any.whl.metadata (8.4 kB)
Collecting auraloss==0.4.0 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached auraloss-0.4.0-py3-none-any.whl.metadata (8.0 kB)
Collecting blobfile==3.0.0 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached blobfile-3.0.0-py3-none-any.whl.metadata (15 kB)
Collecting certifi==2024.8.30 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached certifi-2024.8.30-py3-none-any.whl.metadata (2.2 kB)
Collecting cffi==1.17.1 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached cffi-1.17.1-cp310-cp310-win_amd64.whl.metadata (1.6 kB)
Collecting charset-normalizer==3.3.2 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached charset_normalizer-3.3.2-cp310-cp310-win_amd64.whl.metadata (34 kB)
Collecting click==8.1.7 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached click-8.1.7-py3-none-any.whl.metadata (3.0 kB)
Collecting contourpy==1.3.0 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached contourpy-1.3.0-cp310-cp310-win_amd64.whl.metadata (5.4 kB)
Collecting cycler==0.12.1 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached cycler-0.12.1-py3-none-any.whl.metadata (3.8 kB)
Collecting datasets==3.0.1 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached datasets-3.0.1-py3-none-any.whl.metadata (20 kB)
Collecting decorator==5.1.1 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached decorator-5.1.1-py3-none-any.whl.metadata (4.0 kB)
Collecting deepspeed==0.15.1 (from xcodec2==0.1.3->-r ./requirements_native_hf.txt (line 2))
Using cached deepspeed-0.15.1.tar.gz (1.4 MB)
Installing build dependencies ... done
Getting requirements to build wheel ... error
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> [20 lines of output]
[WARNING] Unable to import torch, pre-compiling ops will be disabled. Please visit https://pytorch.org/ to see how to properly install torch on your system.
←[93m [WARNING] ←[0m unable to import torch, please install it if you want to pre-compile any deepspeed ops.
DS_BUILD_OPS=1
Traceback (most recent call last):
File "D:PATH\venv\lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 353, in
main()
File "D:PATH\venv\lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 335, in main
json_out['return_val'] = hook(**hook_input['kwargs'])
File "D:PATH\venv\lib\site-packages\pip_vendor\pyproject_hooks_in_process_in_process.py", line 118, in get_requires_for_build_wheel
return hook(config_settings)
File "C:PATH\AppData\Local\Temp\pip-build-env-rtdixtus\overlay\Lib\site-packages\setuptools\build_meta.py", line 334, in get_requires_for_build_wheel
return self._get_build_requires(config_settings, requirements=[])
File "C:PATH\AppData\Local\Temp\pip-build-env-rtdixtus\overlay\Lib\site-packages\setuptools\build_meta.py", line 304, in _get_build_requires
self.run_setup()
File "C:PATH\AppData\Local\Temp\pip-build-env-rtdixtus\overlay\Lib\site-packages\setuptools\build_meta.py", line 522, in run_setup
super().run_setup(setup_script=setup_script)
File "C:PATH\AppData\Local\Temp\pip-build-env-rtdixtus\overlay\Lib\site-packages\setuptools\build_meta.py", line 320, in run_setup
exec(code, locals())
File "", line 155, in
AssertionError: Unable to pre-compile ops without torch installed. Please install torch before attempting to pre-compile ops.
[end of output]
note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error
× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.
note: This error originates from a subprocess, and is likely not a problem with pip.
The text was updated successfully, but these errors were encountered: