Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Lightgbm cv feature importance python #1445

Closed
onacrame opened this issue Jun 12, 2018 · 10 comments
Closed

Lightgbm cv feature importance python #1445

onacrame opened this issue Jun 12, 2018 · 10 comments

Comments

@onacrame
Copy link

It would be useful for the cv function to retain mean/std of feature importances across the cv folds. I don’t believe this is currently implemented.

@guolinke
Copy link
Collaborator

guolinke commented Jun 12, 2018

you can do this by yourself simply, just get Booster from cv folds, and then calculate mean/std of feature importance.

@onacrame
Copy link
Author

Hi, thanks for the quick response. The LightGBM native CV function I believe just returns a dictionary of results rather than the individual booster parameters. Or am I missing something.

@guolinke
Copy link
Collaborator

@StrikerRUS can we return the cvfolds in lgb.cv as well ?

@StrikerRUS
Copy link
Collaborator

@guolinke Did you mean return dict(results), cvfolds tuple instead of
https://github.com/Microsoft/LightGBM/blob/3f401477872813a81ca98e90b9b42d085b1013d2/python-package/lightgbm/engine.py#L469

It'll be breaking changes. Maybe push cvfolds into the dict to save the current method signature?

@aerdem4
Copy link

aerdem4 commented Jul 1, 2018

@guolinke @StrikerRUS It would be nice to have an "ensemble of models" object. The benefits of having such a class is to return importances and predictions as an average. Currently, especially on Kaggle, people always do the cross-validation and ensembling at the same time. cv function can either return all generated models or out of sample predictions and average ensembled predictons for the test set if given. If you think that this feature is nice to have, I would like to help.

@JoshuaC3
Copy link

@StrikerRUS could we not do something like,

if return_feature_importances:
    return dict(results), dict(agg_feaure_importances)
else:
    return dict(results)

with default as None or False. Default behaviour then doesn't break.

@StrikerRUS
Copy link
Collaborator

@JoshuaC3 I suppose it's possible. Would you mind to create a PR?

@JoshuaC3
Copy link

JoshuaC3 commented Oct 16, 2018

@StrikerRUS - Yes. It might take me some time to familiarise myself with this but I am very keen to get this in.

All - I gave this a little thought and I think it should return the mean and std of the feature importance. I think it should be called cv_feature_importance as it variable name (please, anyone, advise if you have better suggestions).

I also only intend to return the stats final feature importance's (contrary to the behaviour of dict(results)). I think it is unnecessary (for now) to calculate feature importance for each iteration.

@StrikerRUS
Copy link
Collaborator

Closed in favor of being in #2302. We decided to keep all feature requests in one place.

Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature.

StrikerRUS added a commit that referenced this issue Aug 2, 2020
…283,#2105,#1445) (#3204)

* [python] add return_cvbooster flag to cv function and rename _CVBooster to make public (#283,#2105)

* [python] Reduce expected metric of unit testing

* [docs] add the CVBooster to the documentation

* [python] reflect the review comments

- Add some clarifications to the documentation
- Rename CVBooster.append to make private
- Decrease iteration rounds of testing to save CI time
- Use CVBooster as root member of lgb

* [python] add more checks in testing for cv

Co-authored-by: Nikita Titov <[email protected]>

* [python] add docstring for instance attributes of CVBooster

Co-authored-by: Nikita Titov <[email protected]>

* [python] fix docstring

Co-authored-by: Nikita Titov <[email protected]>

Co-authored-by: Nikita Titov <[email protected]>
@StrikerRUS
Copy link
Collaborator

Implemented in #3204.
Now it is possible to get feature importances from CVBooster.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants