🚘 Vehicle_Claim_Fraud_Detection 💸

Created by Matheus Laranjeira (https://github.com/mathlaranjeira and https://www.linkedin.com/in/matheus-laranjeira-m-sc-1387a859/) & Guilherme Origo Fulop (https://github.com/GuilhermeFulop and https://www.linkedin.com/in/guilherme-origo-fulop/).

📘 Intro

Insurance fraud, as defined by the California Department of Insurance, occurs when someone knowingly lies to obtain a benefit or advantage to which they are not otherwise entitled or someone knowingly denies a benefit that is due and to which someone is entitled. According to the Coalition Against Insurance Fraud, insurance fraud, as a whole, occurs in about 10% of property-casualty insurance losses and steals at least $308.6 billion every year from consumers in the United States. Medical care fraud alone accounts for an estimated cost of $60 billion every year.

Vehicles are also an essential source of insurance fraud, which consists of false or exaggerated claims related to property damage or personal injuries. Some common fraud practices are staged accidents, phantom passengers or exaggerated injuries. With that said, this project focuses on vehicle fraud claims.

📖 Dataset Information

⚖️ Unbalanced Data

The main characteristic of this dataset is the difference between cases that weren't fraud and those that were fraud. Frauds represent only 6% of the entire dataset. If we trained a model this way, the accuracy would be very high, because the model would hit only the non-fraudulent cases and just a few frauds would be prevented. This is not ideal for us, we want a model that predicts correctly almost all frauds in our dataset!

📊 Balancing the data

We'll show only the best result, which was achieved through the undersampling method. For this, we used ClusterCentroid, that uses k-means to identify the cluster centroids and replace some values by the centroid value.

By the image above, we can see that, at the end, the number of not fraud is equal to the fraud amount, proving that the dataset was successfully balanced.

💻 Machine Learning Model

Our best model was RandomForestClassifier, first, we made the hyperparameter tunning, to get the best parameters which will lead us the best result.

With the model prepared, we plot the confudion matrix and, as we can see, only 5 frauds were wrongly predicted, in contrasto to 262 frauds correctly predicted, a hit rate of 98%!

Our decision were also based on the ROC curve, which is displayed below:

By some estimates, through our method, the insurance company could save US$ 434,022.47 per year, a 97.75% cut on expenses!

We also make a deploy of the model, you can check it here: https://mathlaranjeira-vehicle-claim-fraud-detection-deploymain-s4auzj.streamlitapp.com/

✔️ Don't forget!

Please, check out our project, we tested some hypoteses, sampling methods (such as over and undersampling), outliers detection and many more techiniques!

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
dataset		dataset
deploy		deploy
model		model
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🚘 Vehicle_Claim_Fraud_Detection 💸

Created by Matheus Laranjeira (https://github.com/mathlaranjeira and https://www.linkedin.com/in/matheus-laranjeira-m-sc-1387a859/) & Guilherme Origo Fulop (https://github.com/GuilhermeFulop and https://www.linkedin.com/in/guilherme-origo-fulop/).

📘 Intro

📖 Dataset Information

⚖️ Unbalanced Data

📊 Balancing the data

💻 Machine Learning Model

✔️ Don't forget!

About

Releases

Packages

Languages

igorconsulting/Vehicle-Claim-Fraud-Detection

Folders and files

Latest commit

History

Repository files navigation

🚘 Vehicle_Claim_Fraud_Detection 💸

Created by Matheus Laranjeira (https://github.com/mathlaranjeira and https://www.linkedin.com/in/matheus-laranjeira-m-sc-1387a859/) & Guilherme Origo Fulop (https://github.com/GuilhermeFulop and https://www.linkedin.com/in/guilherme-origo-fulop/).

📘 Intro

📖 Dataset Information

⚖️ Unbalanced Data

📊 Balancing the data

💻 Machine Learning Model

✔️ Don't forget!

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages