Comparing Vector Embeddings from Different Models Using Milvus

A project demonstrating the comparison of different language models using song lyrics and vector embeddings stored in Milvus database.

🎵Note: I do not really like Taylor Swift... but I do like this project, which happens to use her lyrics🎵

Project Overview 🤗

The project demonstrates how to compare three different multilingual models based on MiniLM from Hugging Face using Milvus as the vector database. Using song lyrics as test data, it shows how different vector embeddings can be compared and analyzed.

Getting Started

# Create virtual environment
python -m venv venv

# Activate virtual environment

# On Unix or MacOS:
source venv/bin/activate

# On Windows:
venv\Scripts\activate


# Install required packages
pip install pymilvus milvus sentence-transformers

What is Milvus?

Milvus is an open-source vector database built specifically for embedding vectors and similarity search. Key features include:

Purpose-built for managing and searching vector embeddings
Optimized for AI and machine learning applications
Supports efficient similarity search at scale
Provides both standalone and distributed deployment options
Integrates well with modern AI/ML pipelines

For this project, we use Milvus to store and compare our vector embeddings because it's specifically designed for this type of vector similarity search operation.

Models Compared

Base multilingual paraphrase model (MiniLM)
Version fine-tuned for intent detection
Sprylab fine-tuned version

Implementation Steps

Setup and Installation
- Installed required packages: pymilvus, milvus, sentence-transformers
- Set up Milvus server and connection
Model Loading
- Loaded three different sentence transformer models from Hugging Face
- All models are based on MiniLM architecture
Data Preparation
- Used lyrics dataset from multiple songs
- Created a dataset of 51 sentences
- Processed lyrics into suitable format for embedding
Vector Database Setup
- Created schema with appropriate fields
- Set up two collections in Milvus
- Used 384-dimensional vectors (MiniLM standard)
Embedding Generation
- Generated embeddings for all sentences using different models
- Stored embeddings in Milvus collections
Search Implementation
- Implemented search functionality using L2 distance metric
- Used inverted file index with four centroids
- Set up comparison between different model results

Results & Analysis

The implementation produced several interesting findings:

Search Performance

First search completed in 0.0034 seconds
Second search was faster at 0.0013 seconds
Both searches demonstrated efficient performance for vector similarity operations

Embedding Comparison Results

For the first query (wedding scene lyrics):

Query: "I am not the kind of girl, who should be rudely barging in on a white veil occasion..."
1. Exact match (distance: 14.65)
2. Guard/patience lyrics (distance: 16.70)
3. Haunting lyrics (distance: 18.22)

For the second query (sneaking in scene):

Query: "I sneak in and see your friends and her snotty little family..."
1. Exact match (distance: 13.32)
2. Guard/patience lyrics (distance: 15.08)

Key Findings

Self-Match Accuracy: Both models correctly identified exact matches as closest neighbors
Distance Consistency: L2 distances showed consistent patterns between queries
Thematic Grouping: The models identified thematically similar lyrics as secondary matches
Search Efficiency: The second search was notably faster, suggesting potential caching benefits

These results demonstrate the effectiveness of using Milvus for vector similarity search and validate the comparison approach between different embedding models.

Technical Details

Vector Dimension: 384
Distance Metric: L2
Index Type: IVF_FLAT
Search Parameters: nprobe=2, limit=3
Database: Milvus Vector Database

Reference

Do you want to try this project jusing a different artist? Find the blog post here: Comparing Different Vector Embeddings

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
README.md		README.md
lyrics-comparison-embeddings.ipynb		lyrics-comparison-embeddings.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Comparing Vector Embeddings from Different Models Using Milvus

Project Overview 🤗

Getting Started

What is Milvus?

Models Compared

Implementation Steps

Results & Analysis

Search Performance

Embedding Comparison Results

Key Findings

Technical Details

Reference

About

Releases

Packages

Languages

ashleysally00/comparing-vector-embeddings-models-lyrics-milvus

Folders and files

Latest commit

History

Repository files navigation

Comparing Vector Embeddings from Different Models Using Milvus

Project Overview 🤗

Getting Started

What is Milvus?

Models Compared

Implementation Steps

Results & Analysis

Search Performance

Embedding Comparison Results

Key Findings

Technical Details

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages