Highlights
- Pro
Stars
tool for turning many repos into a meta repo. why choose many repos or a monolithic repo, when you can have both with a meta repo?
A generative world for general-purpose robotics & embodied AI learning.
Creation of annotated datasets from scratch using Generative AI and Foundation Computer Vision models
A collection of projects designed to help developers quickly get started with building deployable applications using the Anthropic API
[IROS 2024] Learning Human-to-Humanoid Real-Time Whole-Body Teleoperation. [CoRL 2024] OmniH2O: Universal and Dexterous Human-to-Humanoid Whole-Body Teleoperation and Learning
Image augmentation for machine learning experiments.
Official Implementation of SynthTIGER (Synthetic Text Image Generator), ICDAR 2021
Infinite Photorealistic Worlds using Procedural Generation
Recommended based on comfyui node pictures:Joy_caption + MiniCPMv2_6-prompt-generator + florence2
Official implementation of the Law of Vision Representation in MLLMs
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
SuperPrompt is an attempt to engineer prompts that might help us understand AI agents.
A lightweight library for PyTorch training tools and utilities
[CVPR2024 Highlight]GLEE: General Object Foundation Model for Images and Videos at Scale
Long Context Transfer from Language to Vision
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
[CoRL 2024] Open-TeleVision: Teleoperation with Immersive Active Visual Feedback
The repository provides code for running inference with the Meta Segment Anything Model 2 (SAM 2), links for downloading the trained model checkpoints, and example notebooks that show how to use th…
MINT-1T: A one trillion token multimodal interleaved dataset.
RecordRTC is WebRTC JavaScript library for audio/video as well as screen activity recording. It supports Chrome, Firefox, Opera, Android, and Microsoft Edge. Platforms: Linux, Mac and Windows.
Android ViewServer and ADB client
A Gradio web UI for Large Language Models with support for multiple inference backends.
A pytorch template for beginners based on pytorch_lightning
Data processing for and with foundation models! 🍎 🍋 🌽 ➡️ ➡️🍸 🍹 🍷
[ICLR 2024 & ECCV 2024] The All-Seeing Projects: Towards Panoptic Visual Recognition&Understanding and General Relation Comprehension of the Open World"