LFX: Develop an AI-powered FAQ Chatbot for Vitess using Retrieval-Augmented Generation #17690

rohit-nayak-ps · 2025-02-04T10:02:41Z

Feature Description

Notes for Potential LFX Mentees:

This project is part of the LFX program. More information at https://github.com/cncf/mentoring/tree/main/lfx-mentorship. For mentees, details on how to apply are at https://github.com/cncf/mentoring/tree/main/lfx-mentorship#how-to-apply.
This project doesn't involve direct contributions for Vitess, but, for those interested in the project, the contribution guide has information on learning both golang and Vitess concepts for someone new to Vitess.

⚠️ If you're thinking about opening PRs to the project before the application period begins, please read the initial sections regarding contribution guidelines and advice from a previous gsoc project!

Please do not email or DM the mentors or other Vitess maintainers, on this issue, Slack or elsewhere in this regard: your LFX application should speak for itself.
If you have any questions that you think are unanswered in the issue description, please ask them in the #lfx channel and we'll be happy to answer. 🙏🏽

Description

Vitess is a scalable, distributed database system built on MySQL. The growing complexity of Vitess requires developers to search through extensive documentation, Slack discussions, GitHub issues, and community forums to find relevant information. This project aims to build an AI-powered FAQ chatbot that leverages Retrieval-Augmented Generation (RAG) to provide instant, context-aware answers from multiple knowledge sources.

The chatbot will use a vector database (e.g., PlanetScale, ChromaDB and/or Pinecone) to store indexed embeddings of Vitess documentation, Slack messages, FAQs, and GitHub discussions. It will integrate with an LLM (such as OpenAI GPT-4, DeepSeek, Mistral, or Llama3) to generate responses based on retrieved context. The chatbot will be available via a web interface, CLI tool, and Slack integration, making it easy for developers to get quick answers.

More details on the architecture / design will follow later. The implementation will be in a separate github repository that will be created soon.

Key Components

Choice of tools and technologies are indicative. These will most likely be fine tuned as we progress into the project

Data Ingestion Pipeline: Process documentation, Slack conversations, GitHub issues/pull requests.
Vector Search: Store and retrieve knowledge using a vector database (PlanetScale, ChromaDB, Pinecone, FAISS).
LLM Integration: Use an LLM API (OpenAI, Ollama, vLLM) to generate responses based on retrieved data.
Web UI & CLI Tool: Develop a web-based chatbot and CLI tool for interaction.
Slack Integration: Deploy the chatbot in Slack for real-time developer support.

Expected Outcomes

A working RAG-based chatbot capable of answering common Vitess-related queries.
A CLI tool and Slack bot for developers to interact with the chatbot.
Indexed documentation and discussions in a vector database for fast retrieval.
An evaluation framework to measure response accuracy and relevancy.

Recommended Skills

Golang (for CLI and backend components)
Python (for LLM integration and data processing)
Experience with LLM APIs
Knowledge of vector databases
FastAPI/Fiber for API development
LangChain for retrieval-augmented generation (RAG)

harsv689 · 2025-02-05T09:15:13Z

Hey @rohit-nayak-ps I am Harshvardhan , a pre-final year research undergraduate student from IIIT Hyderabad , currently working in the field of NLP for Code Mixed Machine translation and I feel like I am well versed with all the technologies required for the above implementation and have also worked to create a MVP around the same idea for a startup. Before applying I have started by exploring more of the Vitess , earlier I have been an active open-source contributor for HeadLamp under CNCF and WikiMedia Foundation . I was even 2 times scholarship awardee for Wkimedia Hackathons . Will try to write a relevant proposal for the above as it aligns with my interest .

devjpt23 · 2025-02-06T05:15:55Z

Hello @rohit-nayak-ps,
I am interested in contributing to this issue.
I have worked with making chat bots in the past using OpenAI API and Groq. I had implemented voice output and speech output. Recently, I started created an LLM agent specialized for RAG and has a similar purpose as the FAQ Chatbot.
Please let me know how I can contribute. Looking forward to your response and guidance!

OmBiradar · 2025-02-06T09:27:27Z

Hey @rohit-nayak-ps
I am Om Biradar from IIT BHU.
I find my recent work on RAG would help me contribute to this issue.
I am interested in taking this issue up in the LFX Term 1.
Looking forward to contribute in this project.
Thanks!

GuptaManan100 · 2025-02-06T13:36:08Z

Folks applying for LFX inernships, let me start off by first thanking for your interest in the project. But, please hold off on sending DMs to the mentors or tagging them in GitHub. Please go through the guidelines listed in #17690 (comment) which states -

Please do not DM the mentors or other Vitess maintainers on this issue, Slack or elsewhere in this regard: your LFX application should speak for itself.

If you have any questions that you think are unanswered in the issue description, please ask them in the #lfx channel and we'll be happy to answer. This will also benefit the other applicants and reduce the load for the mentors from needing to answer similar questions multiple times.

rohit-nayak-ps added Component: General Changes throughout the code base Type: Enhancement Logical improvement (somewhere between a bug and feature) labels Feb 4, 2025

rohit-nayak-ps added the LFX label Feb 7, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LFX: Develop an AI-powered FAQ Chatbot for Vitess using Retrieval-Augmented Generation #17690

LFX: Develop an AI-powered FAQ Chatbot for Vitess using Retrieval-Augmented Generation #17690

rohit-nayak-ps commented Feb 4, 2025 •

edited

Loading

Notes for Potential LFX Mentees:

harsv689 commented Feb 5, 2025

devjpt23 commented Feb 6, 2025 •

edited

Loading

OmBiradar commented Feb 6, 2025

GuptaManan100 commented Feb 6, 2025 •

edited

Loading

LFX: Develop an AI-powered FAQ Chatbot for Vitess using Retrieval-Augmented Generation #17690

LFX: Develop an AI-powered FAQ Chatbot for Vitess using Retrieval-Augmented Generation #17690

Comments

rohit-nayak-ps commented Feb 4, 2025 • edited Loading

Feature Description

Notes for Potential LFX Mentees:

Description

Key Components

Expected Outcomes

Recommended Skills

harsv689 commented Feb 5, 2025

devjpt23 commented Feb 6, 2025 • edited Loading

OmBiradar commented Feb 6, 2025

GuptaManan100 commented Feb 6, 2025 • edited Loading

rohit-nayak-ps commented Feb 4, 2025 •

edited

Loading

devjpt23 commented Feb 6, 2025 •

edited

Loading

GuptaManan100 commented Feb 6, 2025 •

edited

Loading