Skip to content

Outliers represent natural variations in the population, and they should be left as is in your dataset. These are called true outliers. Other outliers are problematic and should be removed because they represent measurement errors, data entry or processing errors, or poor sampling.

Notifications You must be signed in to change notification settings

Sengarofficial/Outliers_Impact_Analysis

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Outliers_Impact_Analysis:

Should we remove outliers or not ??

Answer is depends upon statement of the problem because in credit risk analysis these outliers plays a mojor role. likewise in fraud detection these outliers will play a dominant role.

Removing outliers is directly depends upon the dataset because if you see the Titanic dataset, should we keep the outliers or not. Now, this decision will come after like what impact outlier will create on this dataset because in Titanic dataset we are going to decide whether the person will survive or not and in such condition what outliers will create impact that matters the most. It was an accident that has happened and age other factors outliers won't effect the survival so we should remove it but in case of fraud detection we can't remove outliers and for this we have select a model which is not affected by the ouliers.

Another Example: Suppose we are dealing with sales forecasting or stock/crypto analysis and there are sudden spikes on those datsets. Now, those spikes are outliers as those are distributed differently from average distribution within the datsets.Now, should we remove those outliers(spikes) or remove ??

Answer is that we should keep those ouliers(spikes) because these outliers are important factors for the analysis and we have to find the factors for such spikes(outliers) in our analysis. So never ever delete these ouliers.

Similar unusual money transaction is also an outliers in fraud detection dataset but we can't remove those outiers because these outliers(unusual fraud detection)is a factor of analysis.

Authors

Contributing

Contributions are always welcome!

Documentation

https://scikit-learn.org/stable/modules/outlier_detection.html

Feedback

If you have any feedback, please reach out to us at [email protected].

🚀 About Me

| Python Engineer | Machine Learning Engineer | Deep Learning Enthusiasts | Analyst | Electrical & Electronics Engineer | On the Way to Full Stack Developer....

🔗 Links

https://github.com/Sengarofficial

License

The Unlicense

About

Outliers represent natural variations in the population, and they should be left as is in your dataset. These are called true outliers. Other outliers are problematic and should be removed because they represent measurement errors, data entry or processing errors, or poor sampling.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published