Skip to content

Commit

Permalink
Reorganize the code base (#904)
Browse files Browse the repository at this point in the history
  • Loading branch information
lbittarello authored Feb 4, 2025
1 parent 2ecff36 commit 4f9db8f
Show file tree
Hide file tree
Showing 50 changed files with 3,358 additions and 4,878 deletions.
1 change: 0 additions & 1 deletion .gitattributes
Original file line number Diff line number Diff line change
@@ -1,3 +1,2 @@
# GitHub syntax highlighting
pixi.lock linguist-language=YAML

4 changes: 2 additions & 2 deletions .github/workflows/build_wheels.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ jobs:
if: github.event_name == 'release' && github.event.action == 'published'
needs: [build_wheels, build_sdist]
runs-on: ubuntu-latest
environment:
environment:
name: test_release
url: https://test.pypi.org/p/glum
permissions:
Expand All @@ -70,7 +70,7 @@ jobs:
if: github.event_name == 'release' && github.event.action == 'published'
needs: [build_wheels, build_sdist, upload_testpypi]
runs-on: ubuntu-latest
environment:
environment:
name: release
url: https://pypi.org/p/glum
permissions:
Expand Down
1 change: 0 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -150,4 +150,3 @@ pkgs/*
# pixi environments
.pixi
*.egg-info

19 changes: 19 additions & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,22 @@ repos:
language: system
types: [python]
require_serial: true
# pre-commit-hooks
- id: trailing-whitespace-fixer
name: trailing-whitespace-fixer
entry: pixi run -e lint trailing-whitespace-fixer
language: system
types: [text]
exclude: (\.py|README.md)$
- id: end-of-file-fixer
name: end-of-file-fixer
entry: pixi run -e lint end-of-file-fixer
language: system
types: [text]
exclude: (\.py|changelog.rst)$
- id: check-merge-conflict
name: check-merge-conflict
entry: pixi run -e lint check-merge-conflict --assume-in-merge
language: system
types: [text]
exclude: \.py$
2 changes: 1 addition & 1 deletion LICENSE
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
Copyright 2020-2021 QuantCo Inc, Christian Lorentzen
Copyright 2020-2021 QuantCo Inc, Christian Lorentzen

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

Expand Down
3 changes: 3 additions & 0 deletions NOTICE
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
Modified from code submitted as a PR to sklearn: https://github.com/scikit-learn/scikit-learn/pull/9405

Original attribution from: https://github.com/scikit-learn/scikit-learn/pull/9405/filesdiff-38e412190dc50455611b75cfcf2d002713dcf6d537a78b9a22cc6b1c164390d1
1 change: 0 additions & 1 deletion build_tools/prepare_macos_wheel.sh
Original file line number Diff line number Diff line change
Expand Up @@ -10,4 +10,3 @@ else
fi

conda create -n build -c $CONDA_CHANNEL 'llvm-openmp=11'

19 changes: 9 additions & 10 deletions docs/contributing.rst
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Contributing and Development
====================================

Hello! And thanks for exploring glum more deeply. Please see the issue tracker and pull requests tabs on Github for information about what is currently happening. Feel free to post an issue if you'd like to get involved in development and don't really know where to start -- we can give some advice.
Hello! And thanks for exploring glum more deeply. Please see the issue tracker and pull requests tabs on Github for information about what is currently happening. Feel free to post an issue if you'd like to get involved in development and don't really know where to start -- we can give some advice.

We welcome contributions of any kind!

Expand All @@ -25,7 +25,7 @@ Pull request process
Releases
--------------------------------------------------

- We make package releases infrequently, but usually any time a new non-trivial feature is contributed or a bug is fixed. To make a release, just open a PR that updates the change log with the current date. Once that PR is approved and merged, you can create a new release on [GitHub](https://github.com/Quantco/glum/releases/new). Use the version from the change log as tag and copy the change log entry into the release description.
- We make package releases infrequently, but usually any time a new non-trivial feature is contributed or a bug is fixed. To make a release, just open a PR that updates the change log with the current date. Once that PR is approved and merged, you can create a new release on [GitHub](https://github.com/Quantco/glum/releases/new). Use the version from the change log as tag and copy the change log entry into the release description.

Install for development
--------------------------------------------------
Expand Down Expand Up @@ -75,10 +75,10 @@ The test suite is in ``tests/``. A pixi task is available to run the tests:
Golden master tests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

We use golden master testing to preserve correctness. The results of many different GLM models have been saved. After an update, the tests will compare the new output to the saved models. Any significant deviation will result in a test failure. This doesn't strictly mean that the update was wrong. In case of a bug fix, it's possible that the new output will be more accurate than the old output. In that situation, the golden master results can be overwritten as explained below.
We use golden master testing to preserve correctness. The results of many different GLM models have been saved. After an update, the tests will compare the new output to the saved models. Any significant deviation will result in a test failure. This doesn't strictly mean that the update was wrong. In case of a bug fix, it's possible that the new output will be more accurate than the old output. In that situation, the golden master results can be overwritten as explained below.

There are two sets of golden master tests, one with artificial data and one directly using the benchmarking problems from :mod:`glum_benchmarks`. For both sets of tests, creating the golden master and the tests definition are located in the same file. Calling the file with pytest will run the tests while calling the file as a python script will generate the golden master result. When creating the golden master results, both scripts accept the ``--overwrite`` command line flag. If set, the existing golden master results will be overwritten. Otherwise, only the new problems will be run.

Skipping the slow tests
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Expand All @@ -102,7 +102,7 @@ Building a conda package
To use the package in another project, we distribute it as a conda package.
For building the package locally, you can use the following command:

::
::

conda build conda.recipe

Expand All @@ -121,7 +121,7 @@ Then, navigate to `<http://localhost:8000>`_ to view the documentation.

Alternatively, if you install `entr <http://eradman.com/entrproject/>`_, then you can auto-rebuild the documentation any time a file changes with:

::
::

cd docs
./dev
Expand All @@ -141,23 +141,23 @@ If you are a newbie to Sphinx, the links below may help get you up to speed on s
Where to start looking in the source?
-------------------------------------

The primary user interface of ``glum`` consists of the :class:`GeneralizedLinearRegressor <glum.GeneralizedLinearRegressor>` and :class:`GeneralizedLinearRegressorCV <glum.GeneralizedLinearRegressorCV>` classes via their constructors and the :meth:`fit() <glum.GeneralizedLinearRegressor.fit>` and :meth:`predict() <glum.GeneralizedLinearRegressor.predict>` functions. Those are the places to start looking if you plan to change the system in some way.
The primary user interface of ``glum`` consists of the :class:`GeneralizedLinearRegressor <glum.GeneralizedLinearRegressor>` and :class:`GeneralizedLinearRegressorCV <glum.GeneralizedLinearRegressorCV>` classes via their constructors and the :meth:`fit() <glum.GeneralizedLinearRegressor.fit>` and :meth:`predict() <glum.GeneralizedLinearRegressor.predict>` functions. Those are the places to start looking if you plan to change the system in some way.

What follows is a high-level summary of the source code structure. For more details, please look in the documentation and docstrings of the relevant classes, functions and methods.

* ``_glm.py`` - This is the main entrypoint and implements the core logic of the GLM. Most of the code in this file handles input arguments and prepares the data for the GLM fitting algorithm.
* ``_glm_cv.py`` - This is the entrypoint for the cross validated GLM implementation. It depends on a lot of the code in ``_glm.py`` and only modifies the sections necessary for running training many models with different regularization parameters.
* ``_solvers.py`` - This contains the bulk of the IRLS and L-BFGS algorithms for training GLMs.
* ``_cd_fast.pyx`` - This is a Cython implementation of the coordinate descent algorithm used for fitting L1 penalty GLMs. Note the ``.pyx`` extension indicating that it is a Cython source file.
* ``_distribution.py`` - definitions of the distributions that can be used. Includes Normal, Poisson, Gamma, InverseGaussian, Tweedie, Binomial and GeneralizedHyperbolicSecant distributions.
* ``_distribution.py`` - definitions of the distributions that can be used. Includes Normal, Poisson, Gamma, InverseGaussian, Tweedie, Binomial and GeneralizedHyperbolicSecant distributions.
* ``_link.py`` - definitions of the link functions that can be used. Includes identity, log, logit and Tweedie link functions.
* ``_functions.pyx`` - This is a Cython implementation of the log likelihoods, gradients and Hessians for several popular distributions.
* ``_util.py`` - This contains a few general purpose linear algebra routines that serve several other modules and don't fit well elsewhere.

The GLM benchmark suite
------------------------

Before deciding to build a library custom built for our purposes, we did an thorough investigation of the various open source GLM implementations available. This resulted in an extensive suite of benchmarks for comparing the correctness, runtime and availability of features for these libraries.
Before deciding to build a library custom built for our purposes, we did an thorough investigation of the various open source GLM implementations available. This resulted in an extensive suite of benchmarks for comparing the correctness, runtime and availability of features for these libraries.

The benchmark suite has two command line entrypoints:

Expand All @@ -167,4 +167,3 @@ The benchmark suite has two command line entrypoints:
Both of these CLI tools take a range of arguments that specify the details of the benchmark problems and which libraries to benchmark.

For more details on the benchmark suite, see the README in the source at ``src/glum_benchmarks/README.md``.

8 changes: 4 additions & 4 deletions docs/getting_started/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ jupyter:
---

<!-- #region tags=[] -->
# Getting Started: fitting a Lasso model
# Getting Started: fitting a Lasso model

The purpose of this tutorial is to show the basics of `glum`. It assumes a working knowledge of python, regularized linear models, and machine learning. The API is very similar to scikit-learn. After all, `glum` is based on a fork of scikit-learn.

Expand Down Expand Up @@ -62,7 +62,7 @@ X_train, X_test, y_train, y_test = sklearn.model_selection.train_test_split(

## GLM basics: fitting and predicting using the normal family

We'll use `glum.GeneralizedLinearRegressor` to predict the house prices using the available predictors.
We'll use `glum.GeneralizedLinearRegressor` to predict the house prices using the available predictors.

We set three key parameters:

Expand Down Expand Up @@ -118,7 +118,7 @@ which we interact with as in the example above.

## Fitting a GLM with cross validation

Now, we fit using automatic cross validation with `glum.GeneralizedLinearRegressorCV`. This mirrors the commonly used `cv.glmnet` function.
Now, we fit using automatic cross validation with `glum.GeneralizedLinearRegressorCV`. This mirrors the commonly used `cv.glmnet` function.

Some important parameters:

Expand All @@ -130,7 +130,7 @@ Some important parameters:
3. If `min_alpha_ratio` is set, create a path where the ratio of
`min_alpha / max_alpha = min_alpha_ratio`.
4. If none of the above parameters are set, use a `min_alpha_ratio`
of 1e-6.
of 1e-6.
- `l1_ratio`: for `GeneralizedLinearRegressorCV`, if you pass `l1_ratio` as an array, the `fit` method will choose the best value of `l1_ratio` and store it as `self.l1_ratio_`.

```python
Expand Down
2 changes: 1 addition & 1 deletion docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@ Welcome to glum's documentation!

.. image:: _static/headline_benchmark.png
:width: 600

We suggest visiting the :doc:`Installation<install>` and :doc:`Getting Started<getting_started/getting_started>` sections first.

.. toctree::
Expand Down
2 changes: 1 addition & 1 deletion docs/make.bat
Original file line number Diff line number Diff line change
Expand Up @@ -32,4 +32,4 @@ goto end
%SPHINXBUILD% -M help %SOURCEDIR% %BUILDDIR% %SPHINXOPTS% %O%

:end
popd
popd
Loading

0 comments on commit 4f9db8f

Please sign in to comment.