Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

LFX: Develop an AI-powered FAQ Chatbot for Vitess using Retrieval-Augmented Generation #17690

Open
rohit-nayak-ps opened this issue Feb 4, 2025 · 4 comments
Labels
Component: General Changes throughout the code base LFX Type: Enhancement Logical improvement (somewhere between a bug and feature)

Comments

@rohit-nayak-ps
Copy link
Contributor

rohit-nayak-ps commented Feb 4, 2025

Feature Description

Notes for Potential LFX Mentees:

This project is part of the LFX program. More information at https://github.com/cncf/mentoring/tree/main/lfx-mentorship. For mentees, details on how to apply are at https://github.com/cncf/mentoring/tree/main/lfx-mentorship#how-to-apply.
This project doesn't involve direct contributions for Vitess, but, for those interested in the project, the contribution guide has information on learning both golang and Vitess concepts for someone new to Vitess.

⚠️ If you're thinking about opening PRs to the project before the application period begins, please read the initial sections regarding contribution guidelines and advice from a previous gsoc project!

Please do not email or DM the mentors or other Vitess maintainers, on this issue, Slack or elsewhere in this regard: your LFX application should speak for itself.
If you have any questions that you think are unanswered in the issue description, please ask them in the #lfx channel and we'll be happy to answer. 🙏🏽

Description

Vitess is a scalable, distributed database system built on MySQL. The growing complexity of Vitess requires developers to search through extensive documentation, Slack discussions, GitHub issues, and community forums to find relevant information. This project aims to build an AI-powered FAQ chatbot that leverages Retrieval-Augmented Generation (RAG) to provide instant, context-aware answers from multiple knowledge sources.

The chatbot will use a vector database (e.g., PlanetScale, ChromaDB and/or Pinecone) to store indexed embeddings of Vitess documentation, Slack messages, FAQs, and GitHub discussions. It will integrate with an LLM (such as OpenAI GPT-4, DeepSeek, Mistral, or Llama3) to generate responses based on retrieved context. The chatbot will be available via a web interface, CLI tool, and Slack integration, making it easy for developers to get quick answers.

More details on the architecture / design will follow later. The implementation will be in a separate github repository that will be created soon.

Key Components

Choice of tools and technologies are indicative. These will most likely be fine tuned as we progress into the project

  • Data Ingestion Pipeline: Process documentation, Slack conversations, GitHub issues/pull requests.
  • Vector Search: Store and retrieve knowledge using a vector database (PlanetScale, ChromaDB, Pinecone, FAISS).
  • LLM Integration: Use an LLM API (OpenAI, Ollama, vLLM) to generate responses based on retrieved data.
  • Web UI & CLI Tool: Develop a web-based chatbot and CLI tool for interaction.
  • Slack Integration: Deploy the chatbot in Slack for real-time developer support.

Expected Outcomes

  • A working RAG-based chatbot capable of answering common Vitess-related queries.
  • A CLI tool and Slack bot for developers to interact with the chatbot.
  • Indexed documentation and discussions in a vector database for fast retrieval.
  • An evaluation framework to measure response accuracy and relevancy.

Recommended Skills

  • Golang (for CLI and backend components)
  • Python (for LLM integration and data processing)
  • Experience with LLM APIs
  • Knowledge of vector databases
  • FastAPI/Fiber for API development
  • LangChain for retrieval-augmented generation (RAG)
@rohit-nayak-ps rohit-nayak-ps added Component: General Changes throughout the code base Type: Enhancement Logical improvement (somewhere between a bug and feature) labels Feb 4, 2025
@harsv689
Copy link

harsv689 commented Feb 5, 2025

Hey @rohit-nayak-ps I am Harshvardhan , a pre-final year research undergraduate student from IIIT Hyderabad , currently working in the field of NLP for Code Mixed Machine translation and I feel like I am well versed with all the technologies required for the above implementation and have also worked to create a MVP around the same idea for a startup. Before applying I have started by exploring more of the Vitess , earlier I have been an active open-source contributor for HeadLamp under CNCF and WikiMedia Foundation . I was even 2 times scholarship awardee for Wkimedia Hackathons . Will try to write a relevant proposal for the above as it aligns with my interest .

@devjpt23
Copy link

devjpt23 commented Feb 6, 2025

Hello @rohit-nayak-ps,
I am interested in contributing to this issue.
I have worked with making chat bots in the past using OpenAI API and Groq. I had implemented voice output and speech output. Recently, I started created an LLM agent specialized for RAG and has a similar purpose as the FAQ Chatbot.
Please let me know how I can contribute. Looking forward to your response and guidance!

@OmBiradar
Copy link

Hey @rohit-nayak-ps
I am Om Biradar from IIT BHU.
I find my recent work on RAG would help me contribute to this issue.
I am interested in taking this issue up in the LFX Term 1.
Looking forward to contribute in this project.
Thanks!

@GuptaManan100
Copy link
Member

GuptaManan100 commented Feb 6, 2025

Folks applying for LFX inernships, let me start off by first thanking for your interest in the project. But, please hold off on sending DMs to the mentors or tagging them in GitHub. Please go through the guidelines listed in #17690 (comment) which states -

Please do not DM the mentors or other Vitess maintainers on this issue, Slack or elsewhere in this regard: your LFX application should speak for itself.

If you have any questions that you think are unanswered in the issue description, please ask them in the #lfx channel and we'll be happy to answer. This will also benefit the other applicants and reduce the load for the mentors from needing to answer similar questions multiple times.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Component: General Changes throughout the code base LFX Type: Enhancement Logical improvement (somewhere between a bug and feature)
Projects
None yet
Development

No branches or pull requests

5 participants