GitHub - scikit-learn-contrib/mimic: mimic calibration

mimic calibration

Introduction

mimic calibration is a calibration method for binary classification model. This method was presented at NYC ML Meetup talk given by Sam Steingold, see [*].

Implementation

It requires two inputs, the probability prediction from binary classification model and the binary target (0 and 1). Here is how it is implemented

Sort the probabitliy in the ascending. Merge neighbor data points into one bin until the number of positive equal to threshold positive at each bin. In this initial binning, only the last bin may have number of positive less than threshold positive.
Calculate the number of positive rate at each bin. Merge neighbor two bins if nPos rate in the left bin is greater than right bin.
Keep step 2. until the nPos rate in the increasing order.
In this step, we have information at each bin, such as nPos rate, the avg/min/max probability. we record those informations in two places. One is boundary_table. The other is calibrated_model. boundary_table: it records probability at each bin. The default is recording the avg prob of bin. calibrated_model: it records all the information of bin, such nPos rate, the avg/min/max probability.
The final step is linear interpolation.

Install:

pip install -i https://test.pypi.org/simple/ mimiccalib

Parameters:

>>> _MimicCalibration(threshold_pos, record_history)

threshold_pos: the number of positive in the initial binning. default = 5.
record_history: boolean parameter, decide if record all the mergeing of bin history. default = False.

Usage

>>> from mimic import _MimicCalibration
>>> mimicObject = _MimicCalibration(threshold_pos=5, record_history=True)
>>> # y_calib_score: probability prediction from binary classification model
>>> # y_calib: the binary target, 0 or 1.
>>> mimicObject.fit(y_calib_score, y_calib)
>>> y_mimic_calib_score = mimicObject.predict(y_calib_score)

Results: calibration evaluation.

calibration curve and brier score. In our testing examples, mimic and isotonic have very similar brier score. But, as number of bins increase in calibration curve, mimic calibration has more smooth behavior. It is because the calibrated probability of mimic has more continuous prediction space compared to isotonic calibration which is step function. In the following plot, brier scores are 0.1028 (mimic) and 0.1027 (isotonic).

>>> calibration_curve(y_test, y_output_score, n_bins=10)

>>> calibration_curve(y_test, y_output_score, n_bins=20)

The above behavior is similar in the followings cases. 1. base model = GaussianNB, LinearSVC 2. positive rate in the data = 0.5, 0.2

Comparison :mimic, isotonic and platt calibration.

History of merging bins.

Run test

>>> pytest -v --cov=mimic --pyargs mimic

Reference

[*]	https://www.youtube.com/watch?v=Cg--SC76I1I

Name		Name	Last commit message	Last commit date
Latest commit History 117 Commits
.circleci		.circleci
data		data
doc		doc
examples		examples
mimic		mimic
.coveragerc		.coveragerc
.gitignore		.gitignore
.pep8speaks.yml		.pep8speaks.yml
.readthedocs.yml		.readthedocs.yml
.travis.yml		.travis.yml
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
README.rst		README.rst
appveyor.yml		appveyor.yml
enviornment.yml		enviornment.yml
requirements.txt		requirements.txt
requirements_doc.txt		requirements_doc.txt
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mimic calibration

Introduction

Implementation

Install:

Parameters:

Usage

Results: calibration evaluation.

Comparison :mimic, isotonic and platt calibration.

History of merging bins.

Run test

Reference

About

Releases

Packages

Contributors 2

Languages

License

scikit-learn-contrib/mimic

Folders and files

Latest commit

History

Repository files navigation

mimic calibration

Introduction

Implementation

Install:

Parameters:

Usage

Results: calibration evaluation.

Comparison :mimic, isotonic and platt calibration.

History of merging bins.

Run test

Reference

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages