SnapIntel

Voice-to-Voice Assistant for Instant Insights from Screenshots

SnapIntel is a personal voice-to-voice assistant that provides immediate, actionable insights from the screenshots you decide to share. Whether you're solving an issue or looking for deeper understanding, SnapIntel is here to help.

This project is an open-source initiative that leverages Google Gemini to analyze images and provide responses. Various services are used to transcribe the user queries and generate spoken responses, including the local services FastWhisperAPI and FastXttsAPI.

If you find SnapIntel useful, please consider leaving a star ⭐ or donate.

Video Demo

Features

Easy and Intuitive Interface: Use voice-to-voice interactions for a seamless user experience.
Privacy-Focused Assistant: Maintain control over your data; decide what to share with a simple key combination press.
Instant Insights: Receive actionable information quickly from screenshots you choose to analyze.
Local Services Integration: Integrate with FastWhisperAPI and FastXttsAPI for localized query transcription and response vocalization.
Chat History: Records images and interactions within the session, enabling follow-up questions on images and recalling previous queries or responses.
Real-Time Session Logging: Automatically logs session history in a neatly formatted markdown file, accessible in real-time from the local logs folder.
Flexibility and Expandability: Built to adapt and grow with future enhancements and integrations.
Transcription Services: Support OpenAI, Groq, Deepgram, and FastWhisperAPI (Faster Whisper) for efficient transcription of user queries.
Speech Services: Support OpenAI, ElevenLabs, Cartesia, Deepgram, and FastXttsAPI (Coqui) for quick and natural-sounding vocalization of responses.

Requirements

Python 3.10 or greater
FFmpeg. Instructions on how to install it can be found here
FastWhisperAPI and FastXttsAPI offer local transcription and speech solutions. Their use is optional. For information on deployment and requirements of these services, please refer to their respective documentation.

Dependencies

This project depends on the following libraries:

pillow
python-dotenv
keyboard
requests
colorama
SpeechRecognition
google.generativeai
websocket-client
pyaudio
numpy

Installation

Clone the repository:

git clone https://github.com/3choff/SnapIntel.git

Navigate to the project directory:
```
cd SnapIntel
```
Create a new environment:
```
python3 -m venv SnapIntel
```
Activate the virtual environment:
- On Unix/Linux/macOS:
```
source SnapIntel/bin/activate
```
- On Windows:
```
SnapIntel\Scripts\activate
```
Install the required packages:
```
pip install -r requirements.txt
```

Configuration

API keys

SnapIntel uses dotenv to set the API keys. Create a .env file in the root directory with your API keys. Follow the structure of the example.env file as a template.

Transcription and Speech services

The app supports multiple transcription and speech services right out of the box. You can select from the following options:

Transcription Services:

Deepgram
Openai
Groq
FastWhisperAPI, a local transcription API server using Faster Whisper.

Speech Services:

Deepgram
OpenAI
ElevenLabs
Cartesia (EXPERIMENTAL)
FastXttsAPI, a local speech API server using Coqui.

To change the transcription or speech service, simply edit the relevant variables in the Config.py file located in the services folder. The accepted choices are commented next to each variable.

In the same file, you can change other related variables such as voices and language.

Usage

To run the SnapIntel, use the following command:

python app.py

When the app starts, it will prompt you to either start a new session or resume a previous session stored in the history folder. After making your choice, you can interact with the LLM using these key combinations:

Press Ctrl+Alt+Space to capture and analyze the screen and invoke the voice assistant.
Press Ctrl+Space to ask a question without capturing a screenshot or to ask a follow-up question.
Press ESC to stop speech playback.
Press Ctrl+C to exit the script.

Support

If you find this project helpful and would like to support its development, there are several ways you can contribute:

Star: Consider leaving a star ⭐️ to increase the visibility of the project.
Support: Consider donate to support my work.
Contribute: If you're a developer, feel free to contribute to the project by submitting pull requests or opening issues.
Spread the Word: Share this project with others who might find it useful.

Your support means a lot and helps keep this project going. Thank you for your contribution!

Acknowledgements

This project is inspired by innovative features showcased by OpenAI in their demo of the upcoming features of ChatGPT, combining voice and vision capabilities to provide assistance and insights. The Verbi chatbot project and the Screen to Voice Tutorial of All About AI have significantly influenced this project, forming the foundation for its development. I recommend checking the links if you want to know more.

License

This project is licensed under the Apache License 2.0.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.github		.github
assets		assets
services		services
utils		utils
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
app.py		app.py
example.env		example.env
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SnapIntel

Video Demo

Features

Requirements

Dependencies

Installation

Configuration

API keys

Transcription and Speech services

Usage

Support

Acknowledgements

License

About

Releases

Sponsor this project

Packages

Languages

License

3choff/SnapIntel

Folders and files

Latest commit

History

Repository files navigation

SnapIntel

Video Demo

Features

Requirements

Dependencies

Installation

Configuration

API keys

Transcription and Speech services

Usage

Support

Acknowledgements

License

About

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages