Skip to content

igorconsulting/Vehicle-Claim-Fraud-Detection

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

28 Commits
 
 
 
 
 
 
 
 

Repository files navigation

🚘 Vehicle_Claim_Fraud_Detection 💸

📘 Intro

Insurance fraud, as defined by the California Department of Insurance, occurs when someone knowingly lies to obtain a benefit or advantage to which they are not otherwise entitled or someone knowingly denies a benefit that is due and to which someone is entitled. According to the Coalition Against Insurance Fraud, insurance fraud, as a whole, occurs in about 10% of property-casualty insurance losses and steals at least $308.6 billion every year from consumers in the United States. Medical care fraud alone accounts for an estimated cost of $60 billion every year.

Vehicles are also an essential source of insurance fraud, which consists of false or exaggerated claims related to property damage or personal injuries. Some common fraud practices are staged accidents, phantom passengers or exaggerated injuries. With that said, this project focuses on vehicle fraud claims.

📖 Dataset Information

⚖️ Unbalanced Data

The main characteristic of this dataset is the difference between cases that weren't fraud and those that were fraud. Frauds represent only 6% of the entire dataset. If we trained a model this way, the accuracy would be very high, because the model would hit only the non-fraudulent cases and just a few frauds would be prevented. This is not ideal for us, we want a model that predicts correctly almost all frauds in our dataset!

image

📊 Balancing the data

We'll show only the best result, which was achieved through the undersampling method. For this, we used ClusterCentroid, that uses k-means to identify the cluster centroids and replace some values by the centroid value.

image

By the image above, we can see that, at the end, the number of not fraud is equal to the fraud amount, proving that the dataset was successfully balanced.

💻 Machine Learning Model

Our best model was RandomForestClassifier, first, we made the hyperparameter tunning, to get the best parameters which will lead us the best result.

image

With the model prepared, we plot the confudion matrix and, as we can see, only 5 frauds were wrongly predicted, in contrasto to 262 frauds correctly predicted, a hit rate of 98%!

image

Our decision were also based on the ROC curve, which is displayed below:

image

By some estimates, through our method, the insurance company could save US$ 434,022.47 per year, a 97.75% cut on expenses!

We also make a deploy of the model, you can check it here: https://mathlaranjeira-vehicle-claim-fraud-detection-deploymain-s4auzj.streamlitapp.com/

✔️ Don't forget!

Please, check out our project, we tested some hypoteses, sampling methods (such as over and undersampling), outliers detection and many more techiniques!

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Jupyter Notebook 99.8%
  • Python 0.2%