txtai executes machine-learning workflows to transform data and build AI-powered text indices to perform similarity search.
Summary of txtai features:
- 🔎 Large-scale similarity search with multiple index backends (Faiss, Annoy, Hnswlib)
- 📄 Create embeddings for text snippets, documents, audio and images. Supports transformers and word vectors.
- 💡 Machine-learning pipelines to run extractive question-answering, zero-shot labeling, transcription, translation, summarization and text extraction
- ↪️️ Workflows that join pipelines together to aggregate business logic. txtai processes can be microservices or full-fledged indexing workflows.
- 🔗 API bindings for JavaScript, Java, Rust and Go
- ☁️ Cloud-native architecture that scales out with container orchestration systems (e.g. Kubernetes)
txtai and/or the concepts behind it has already been used to power the Natural Language Processing (NLP) applications listed below:
Application | Description |
---|---|
paperai | AI-powered literature discovery and review engine for medical/scientific papers |
tldrstory | AI-powered understanding of headlines and story text |
neuspo | Fact-driven, real-time sports event and news site |
codequestion | Ask coding questions directly from the terminal |
txtai is built with Python 3.6+, Hugging Face Transformers, Sentence Transformers and FastAPI
The easiest way to install is via pip and PyPI
pip install txtai
You can also install txtai directly from GitHub. Using a Python Virtual Environment is recommended.
pip install git+https://github.com/neuml/txtai
Python 3.6+ is supported. txtai has the following environment specific prerequisites.
Optional audio transcription requires a system library to be installed
Run brew install libomp
see this link
Install C++ Build Tools
The examples directory has a series of notebooks and applications giving an overview of txtai. See the sections below.
Notebook | Description | |
---|---|---|
Introducing txtai | Overview of the functionality provided by txtai | |
Build an Embeddings index with Hugging Face Datasets | Index and search Hugging Face Datasets | |
Build an Embeddings index from a data source | Index and search a data source with word embeddings | |
Add semantic search to Elasticsearch | Add semantic search to existing search systems | |
Extractive QA with txtai | Introduction to extractive question-answering with txtai | |
Extractive QA with Elasticsearch | Run extractive question-answering queries with Elasticsearch | |
Apply labels with zero shot classification | Use zero shot learning for labeling, classification and topic modeling | |
API Gallery | Using txtai in JavaScript, Java, Rust and Go | |
Building abstractive text summaries | Run abstractive text summarization | |
Extract text from documents | Extract text from PDF, Office, HTML and more | |
Transcribe audio to text | Convert audio files to text | |
Translate text between languages | Streamline machine translation and language detection | |
Similarity search with images | Embed images and text into the same space for search | |
Run pipeline workflows | Simple yet powerful constructs to efficiently process data |
Application | Description |
---|---|
Workflow builder | Build and execute txtai workflows. Connect summarization, text extraction, transcription, translation and similarity search pipelines together to run unified workflows. |
Image search | Image similarity search application. Index a directory of images and run searches to identify images similar to the input query |
Demo query shell | Basic similarity search example. Used in the original txtai demo. |
Full documentation on txtai including configuration settings for pipelines, workflows, indexing and the API.
For those who would like to contribute to txtai, please see this guide.