This project compares and evaluates the performance of various Machine Learning Models using F-measure, Accuracy and AUC (Area Under Curve) with respect to disparate datasets.
The Machine Learning Models applied are as follows:
- Bagging with Decision Tree
- Random Forest
- AdaBoost
- 3-NN
- SVM with Linear Kernel
- SVM with RBF Kernel
- Naive Bayes
- Decision Tree
- Kmeans (5 clusters) with 3-NN classifier -> Stacking
The datasets used to compare the above models are listed below:
- Abalone (https://archive.ics.uci.edu/ml/datasets/abalone)
- Balance Scale (http://archive.ics.uci.edu/ml/datasets/balance+scale)
- CMC (https://archive.ics.uci.edu/ml/datasets/Contraceptive+Method+Choice)
- Glass (https://archive.ics.uci.edu/ml/datasets/glass+identification)
- Housing (https://archive.ics.uci.edu/ml/machine-learning-databases/housing/)
- Haberman (https://archive.ics.uci.edu/ml/datasets/Haberman%27s+Survival)
- HSLog (http://archive.ics.uci.edu/ml/datasets/statlog+(heart))
- Ionosphere (https://archive.ics.uci.edu/ml/datasets/ionosphere)
- Nursery (https://archive.ics.uci.edu/ml/datasets/nursery)
- Phoneme (uploaded)
Each model is applied on each dataset with a 10x10 Fold Cross Validation and a comprehensive table for each performance measure (F-measure, Accuracy and AUC) is written to 'Results.csv'. Statistical analysis via t-test and WIN-TIE-LOSS is also performed and a table for each performance measure is again written to the same CSV file. These 6 tables are formatted properly and explained in the document 'Report.docx' uploaded.
Each table compares a specific performance measure of the given models with respect to each dataset.