Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancing the Flexibility of Linear Models in Leaf Nodes of Boosted Linear Trees #6630

Open
ToddMeng opened this issue Aug 29, 2024 · 1 comment

Comments

@ToddMeng
Copy link

Summary

Enhancing the Flexibility of Linear Models in Leaf Nodes of Boosted Linear Trees

Motivation

Linear trees represent a practical technique that not only enhances model performance and simplifies model structure but also improves model interpretability. When working with linear models, users often need to impose numerous custom constraints to enhance interpretability and incorporate additional prior knowledge. These constraints may include restricting all regression coefficients to be positive, defining the monotonicity of each variable, and limiting the linear regression to a subset of selected features.

Description

As a regular user of this library, I am deeply grateful for the diligent efforts of all developers and maintainers, whose hard work has greatly facilitated our work.
Upon a thorough review of the documentation and the linear_tree_learner.cpp code (link: https://github.com/microsoft/LightGBM/blob/master/src/treelearner/linear_tree_learner.cpp), I have observed that, apart from the ridge regression parameters, the linear model component lacks support for other features, such as the aforementioned constraints on the signs of regression coefficients and the capability to include only a subset of features in the linear regression.

References

It is proposed that the functionality extensions of linear models in sklearn could be referenced, or an interface could be provided to enable users to customize linear models, thereby enhancing the flexibility and practicality of linear tree models.

@jaguerrerod
Copy link

jaguerrerod commented Sep 6, 2024

Related to this, I think adding the option to include some predictors in all linear models, in addition to the predictors used in the splits to reach the leaf, is important.
I have datasets containing data from several population segments, and I am not interested in including the variables that define the segments in the model itself. However, I would like to include an adjustment in the prediction using the segment flags in the linear model fitted to each leaf.
My leaves have more than 20K observations, so including this segment adjustment does not pose an overfitting problem.
This option could be set through a parameter, 'features_forced_to_leaf_linear_model', as an array of feature indices or feature names.
I think this wouldn't be complex to implement, but I don't have the necessary C++ skills to do it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants