Machine Learning Examples

Project Structure

src/
├── clustering/          # Patient clustering analysis
├── fraud-detection/     # Fraud detection models
├── sentiment_analysis/  # Text sentiment analysis
├── mnist/              # MNIST digit classification
├── logistic-regression/ # Binary classification
├── linear-regression/   # Linear regression
├── content-based-filtering/ # Content-based filtering
└── requirements.txt     # Dependencies

Setup

Clone the repository:

git clone https://github.com/mineme0110/ml.git
cd ml

Create virtual environment:

python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Models

Patient Clustering

Unsupervised learning for patient segmentation based on health metrics.

Install dependencies:

pip install -r src/clustering/requirements.txt

Run clustering analysis:

python src/clustering/train.py

Features:

Multiple clustering algorithms:
- K-means: Groups patients into k clusters
- DBSCAN: Density-based clustering
- Hierarchical: Creates cluster hierarchy
Health metrics analyzed:
- Age
- BMI
- Blood Pressure
- Glucose Level
- Cholesterol
- Heart Rate
Evaluation metrics:
- Silhouette Score
- Calinski-Harabasz Score
Visualizations:
- 2D cluster plots
- Feature distributions

Example output:

Training KMEANS clustering...

Clustering Results:
Number of clusters: 3
Silhouette Score: 0.303
Calinski-Harabasz Score: 543.462

Cluster Characteristics:
Cluster 0 (Young, Healthy):
- Age: ~25 years
- BMI: ~22
- Blood Pressure: ~110

Cluster 1 (Middle-aged):
- Age: ~45 years
- BMI: ~28
- Blood Pressure: ~130

Cluster 2 (Elderly):
- Age: ~70 years
- BMI: ~26
- Blood Pressure: ~145

Sentiment Analysis

Text classification for sentiment analysis of movie reviews.

Install dependencies:

pip install -r src/sentiment_analysis/requirements.txt

Train the model:

python src/sentiment_analysis/train.py

Interactive testing:

python src/sentiment_analysis/interactive_test.py

MNIST Digit Classification

Neural network for MNIST digit classification using PyTorch.

Install dependencies:

pip install -r requirements.txt

Run the training script:

python src/mnist/train.py

Logistic Regression

Binary classification example using scikit-learn's Logistic Regression.

Install dependencies:

pip install -r src/logistic-regression/requirements.txt

Run the training script:

python src/logistic-regression/train.py

This will demonstrate logistic regression on a generated dataset with visualization of the decision boundary.

Content-Based Filtering

Movie recommendation system using content-based filtering.

Install dependencies:

pip install -r src/content-based-filtering/requirements.txt

Run the recommendation system:

python src/content-based-filtering/train.py

This will demonstrate movie recommendations based on content similarity using a sample movie dataset. The system considers movie genres, actors, and descriptions to make recommendations.

Example output:

Getting recommendations for: The Dark Knight
Recommended Movies:
------------------------------------------------------------
1. Iron Man
   Genres: Action, Adventure, Sci-Fi
   Similarity Score: 0.8245
------------------------------------------------------------
2. The Matrix
   Genres: Action, Sci-Fi
   Similarity Score: 0.7856

Fraud Detection

XGBoost-based fraud detection system for identifying fraudulent transactions.

Prerequisites for Mac users:

# Install OpenMP library (required for XGBoost)
brew install libomp

Simple Model

A basic fraud detection model with clear separation between normal and fraudulent transactions.

Install dependencies:

pip install -r src/fraud-detection/requirements.txt

Run the simple fraud detection model:

python src/fraud-detection/train_simple.py

Features:

Clear separation between classes
Basic XGBoost parameters
Perfect for understanding basic fraud detection concepts
Shows idealized probability distributions

Realistic Model

A more sophisticated model that better represents real-world fraud detection scenarios.

Install dependencies (if not already installed):

pip install -r src/fraud-detection/requirements.txt

Run the realistic fraud detection model:

python src/fraud-detection/train_realistic.py

Features:

Handles imbalanced classes
Uses realistic transaction patterns:
- Transaction amounts
- Time of day patterns
- Geographic distances
- Transaction frequency
Early stopping and validation
Feature importance analysis
More representative probability distributions

Example output:

Realistic Model Performance:
ROC AUC Score: 0.9985
Top 5 Most Important Features:
transaction_amount: 0.4532
time_of_day: 0.2876
distance_from_last: 0.1543
transaction_frequency: 0.0892
feature_5: 0.0157

License

This project is distributed under the terms of the MIT license.

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
src		src
tests		tests
.gitignore		.gitignore
.python-version		.python-version
LICENSE.txt		LICENSE.txt
README.md		README.md
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
training_results.png		training_results.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Machine Learning Examples

Table of Contents

Project Structure

Setup

Models

Patient Clustering

Sentiment Analysis

MNIST Digit Classification

Logistic Regression

Content-Based Filtering

Fraud Detection

Simple Model

Realistic Model

License

Project Structure

About

Releases

Packages

Languages

License

mineme0110/ml

Folders and files

Latest commit

History

Repository files navigation

Machine Learning Examples

Table of Contents

Project Structure

Setup

Models

Patient Clustering

Sentiment Analysis

MNIST Digit Classification

Logistic Regression

Content-Based Filtering

Fraud Detection

Simple Model

Realistic Model

License

Project Structure

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages