A secure cloud Linux computer powered by E2B Desktop Sandbox and controlled by open-source LLMs.
Desktop.Use.+.Streaming.mp4
- Uses E2B for secure Desktop Sandbox
- Operates the computer via the keyboard, mouse, and shell commands
- Supports 10+ LLMs, OS-Atlas/ShowUI and any other models you want to integrate!
- Live streams the display of the sandbox on the client computer
- User can pause and prompt the agent at any time
- Uses Ubuntu, but designed to work with any operating system
The details of the design are laid out in this article: How I taught an AI to use a computer
Open Computer Use is designed to make it easy to swap in and out new LLMs. The LLMs used by the agent are specified in config.py like this:
grounding_model = providers.OSAtlasProvider()
vision_model = providers.GroqProvider("llama3.2")
action_model = providers.GroqProvider("llama3.3")
The providers are imported from providers.py and include:
- Fireworks, OpenRouter, Llama API:
- Llama 3.2 (vision only), Llama 3.3 (action only)
- Groq:
- Llama 3.2 (vision + action), Llama 3.3 (action only)
- DeepSeek:
- DeepSeek (action only)
- Google:
- Gemini 2.0 Flash (vision + action)
- OpenAI:
- GPT-4o and GPT-4o mini (vision + action)
- Anthropic:
- Claude (vision + action)
- HuggingFace Spaces:
- OS-Atlas (grounding)
- ShowUI (grounding)
If you add a new model or provider, please make a PR to this repository with the updated providers.py!
- Python 3.10 or later
- git
- E2B API key
- API key for an LLM provider (see above)
In your terminal:
brew install poetry ffmpeg
In your terminal:
git clone https://github.com/e2b-dev/open-computer-use/
Enter the project directory:
cd open-computer-use
Create a .env
file in open-computer-use
and set the following:
# Get your API key here: https://e2b.dev/
E2B_API_KEY="your-e2b-api-key"
Additionally, add API key(s) for any LLM providers you're using:
# You only need the API key for the provider(s) selected in config.py:
# Hugging Face Spaces do not require an API key.
FIREWORKS_API_KEY=...
OPENROUTER_API_KEY=...
LLAMA_API_KEY=...
GROQ_API_KEY=...
GEMINI_API_KEY=...
OPENAI_API_KEY=...
ANTHROPIC_API_KEY=...
Run the following command to start the agent:
poetry install
poetry run start
The agent will open and prompt you for its first instruction.
To start the agent with a specified prompt, run:
poetry run start --prompt "use the web browser to get the current weather in sf"
The display stream should be visible a few seconds after the Python program starts.