This are my basic projects build from scratch, I have used Python for programming.
- Iris Data set analysis -Machine learning project
In this project i have worked on iris data set with 150 samples of data ,and by using 6 different algorithms i have tried to find best fit for by prediction model ,later on i have used my built model on test data set to see the correctness of predictionand hence with charactteristics like sepal-length,petal-width etc 4 chracteristics we can predict the class of the flower.
- Naive Bayes data-analysis
In this project i have applied the naives bayes algorithm to weather dataset which predicts whether it should go for playing or not ,diabetes prediction dataset and breast cancer detection dataset.
- Parts of speech tagger
A parts of speech tagger where if we put a sentence in pos_tag function we get value of which parts of speech each word of that sentence belong to .Pos tagger is used for Grammar correction system ,Sentiment Analysis etc.
- Product Recommender
I have used Term Frequency and Inverse Document Frequency (TF — IDF)and cosine_similarities to find the similarity between products in database and recommend common product to the selected product by consumer.
- Stock market analysis using apple stock data base
I have used apple stock market latest data set from google.finance.com and applied regression models on it to check out the predictions.
- Product recomender using image processing in matlab
I have used HSV and Gabor radon algorithm to extract texture and color features of a image and later calculate euclidean distance between the query vector and the database of feature factor of images and best 10 images would be displayed.
- Human activity recognization
The Human Activity Recognition database was built from the recordings of 30 study participants performing activities of daily living (ADL) while carrying a waist-mounted smartphone with embedded inertial sensors. The objective is to classify activities into one of the six activities performed.
- Predicting Financial distress
Here I have used a totally imbalanced dataset 3.8% of data includes sample of company who are under financial distress and rest 96% of data is of company with stable financial state.I have shown many techniques which are used for handling unbalanced data like undersampling, oversampling,etc.You can also fork my kaggle kernel https://www.kaggle.com/rinki24/financial-distress-prediction
- Analytics Vidhya : Loan Prediction III
A classification problem based on whether a person's application for a loan would be passed or rejected or if a person is eligible for the loan amount requested (If a bank wanted to automate the loan granting process).
- WaterPump_Classification (Top 30% among the teams participated)
This DrivenData competition was for identification of Tanzmanian govertment's water data I have used CatBoost algorithm which is proven to be best boosting algorithm for dataset having categorical values and as boosting algorithm has added advantage on working good on less data .
Score: 0.7261
Metric used :
Classification Rate =1N∑Ni=0I(yi=yi^)
Competition link: https://www.drivendata.org/competitions/7/pump-it-up-data-mining-the-water-table/
- Haptik data classification of small talk
Made a simple classfier to help chatbot understand whether a chat is small talk or not used python, nltk and sklearn.
- Udacity ML competitions (https://www.kaggle.com/c/udacity-mlcharity-competition)
Made submission in udacity ml competition got 64th Rank in the leaderboard