A voice-based chat application that allows users to interact with an AI assistant using speech. The application leverages OpenAI's Whisper small for accurate speech recognition and Kokoro-TTS for natural-sounding voice synthesis. You can use a LM Studio local model that needs to be served at http://localhost:1234 to generate response. Requires meta-llama-3.1-8b-instruct for tool-use. Works great on Macbook M1 Pro, but Linux works too.
Use python3.12 to run this application, because of PyTorch dependency.
Hacked together with Claude and Cursor.
-
Clone the repository:
$ git clone https://github.com/jpzk/voicemvp.git $ cd voicemvp
-
Install system dependencies (macOS):
$ brew install portaudio
-
Install dependencies:
$ python3.12 -m venv env $ source env/bin/activate $ python3.12 -m pip install -r requirements.txt
-
Run the application:
$ python3.12 voice_chat_agent.py
-
The first run might take a while as it needs to download the models.