-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Even when forced_splits is set, the threshold is chosen from the bin_upper_bounds. #1829
Comments
it is not trivial as the contents of force-splits json should be used in bin mapper finding algorithm. I think a better solution is to allow the pre-defined |
@guolinke Should anything be done before 2.2.4 release here? |
@StrikerRUS As it is not trivial and not very critical, I think we can do it after v2.2.4. |
Closed in favor of being in #2302. We decided to keep all feature requests in one place. Welcome to contribute this feature! Please re-open this issue (or post a comment if you are not a topic starter) if you are actively working on implementing this feature. |
Reopening this as we have open PR. |
Implemented in #2325. |
In the lightgbm, each numerical variable is replaced with discrete bins, and the threshold for split is chosen from the upper bounds of bin(variable name
bin_upper_bounds
).Even when
forced_splits
is set, the lightgbm decides the threshold like that.As a result, an unexpected result occurs.
Environment info
Operating System:
Ubuntu 14.04.5 LTS, Trusty Tahr
CPU/GPU model:
cpu
Intel(R) Core(TM) i7-6800K CPU @ 3.40GHz
C++/Python/R version:
Python 3.6.6 (default, Oct 9 2018, 12:34:16)
[GCC 7.3.0] :: Anaconda, Inc. on linux
lightgbm version : newest version (merged the commit until the commit id ca4b666(new version for master branch (#1824))
source
result
The json output of the first decision tree is as follows:
In line 4, the threshold for split is 1e+300, not 0.5.
I investigated the cause of the problem.
The threshold for split is chosen from
bin_upper_bounds
, specifically, the minimum value that is greater than or equal to thethreshold
forced byforced_splits
.In this case,
bin_upper_bounds = [1e-35, inf]
, so the valueinf
is chosen as a threshold for split.That is because the minimum value in
[1e-35, inf]
that is greater than or equal to 0.5 isinf
.As a result, the lightgbm try to split the data using the threshold
inf
.I hope that the data is splitted by the 0.5 if the
threshold
is set to 0.5 atforced_splits
.The text was updated successfully, but these errors were encountered: