Custom median objective function in lightgbm.cv() #6620

arumds · 2024-08-22T13:04:27Z

LightGBM version 4.0.0

The objective='regression' trains to predict the mean representation of the data. And i am interested to train to predict median representation of the actual values. Infact, a quanitle model with alpha=0.5 will solve the problem. However, the quantile model does not work with monotone_constraints parameter which is essential in our case. Therefore, a custom median_loss is used as objective passed to the params.

def median_loss(preds, train_data: lgb.Dataset):
        y_true = train_data.get_label()
        residual = preds - y_true
        grad = np.where(residual > 0, 0.5, -0.5)
        hess = np.ones_like(grad)  # Hessian is constant for median pinball loss
        return grad, hess

params={
        "objective": median_loss,
    },

cv_result = lgb.cv(params, dtrain, nfold=n_folds,  stratified=False, return_cvbooster=True)

[LightGBM] [Warning] Using self-defined objective function
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf

Debugging shows that all predictions during the lgb.cv step are 0's and therefore the gradients are uniform. It might not be providing LightGBM with sufficient gradient information to make meaningful splits.

Does anyone have a suggestion on how to train the model effectively with medain_loss custom objective or with a quantile objective preserving the monotonic constraint. @jameslamb @vladv14

The text was updated successfully, but these errors were encountered:

jmoralez · 2024-08-22T16:34:23Z

Hey. Thanks for using LightGBM. Can you try setting the condition to greater equal? i.e.

grad = np.where(residual >= 0, 0.5, -0.5)

arumds · 2024-08-22T20:11:40Z

@jmoralez tried setting to grad = np.where(residual >= 0, 0.5, -0.5)

params={
        "objective": median_loss,
    },

cv_result = lgb.cv(params, dtrain, nfold=n_folds,  metrics='rmse', stratified=False, return_cvbooster=True)

Log:

[LightGBM] [Warning] Using self-defined objective function
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
[LightGBM] [Warning] Using self-defined objective function
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
[LightGBM] [Warning] Using self-defined objective function
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
[LightGBM] [Warning] Using self-defined objective function
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
[LightGBM] [Warning] Using self-defined objective function
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
[1]	cv_agg's train rmse: 4.66734 + 0.00107263	cv_agg's valid rmse: 4.66734 + 0.00428721
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf
[LightGBM] [Warning] Stopped training because there are no more leaves that meet the split requirements
[LightGBM] [Warning] No further splits with positive gain, best gain: -inf

When tried to debug the median_loss objective at the execution of lgb.cv(), the pred are all zero as seen in the screenshot:

With the obective='regression' the model gets trained normally. Logs are below:

1]	cv_agg's train rmse: 0.730986 + 0.000761274	cv_agg's valid rmse: 0.730999 + 0.00305853
[2]	cv_agg's train rmse: 0.724106 + 0.000747364	cv_agg's valid rmse: 0.724126 + 0.00305247
[3]	cv_agg's train rmse: 0.717755 + 0.000743182	cv_agg's valid rmse: 0.717786 + 0.00304095
[4]	cv_agg's train rmse: 0.711056 + 0.000728518	cv_agg's valid rmse: 0.711092 + 0.00303802
[5]	cv_agg's train rmse: 0.704382 + 0.000716823	cv_agg's valid rmse: 0.704426 + 0.00302899
[6]	cv_agg's train rmse: 0.69778 + 0.00070809	cv_agg's valid rmse: 0.697832 + 0.00301913
[7]	cv_agg's train rmse: 0.691297 + 0.000700247	cv_agg's valid rmse: 0.691353 + 0.00301123
[8]	cv_agg's train rmse: 0.685269 + 0.000683244	cv_agg's valid rmse: 0.685337 + 0.00301251
[9]	cv_agg's train rmse: 0.678915 + 0.000665435	cv_agg's valid rmse: 0.678987 + 0.00301451
[10]	cv_agg's train rmse: 0.672621 + 0.000661577	cv_agg's valid rmse: 0.672699 + 0.00300223
[11]	cv_agg's train rmse: 0.666394 + 0.000655792	cv_agg's valid rmse: 0.666477 + 0.00299132

jmoralez · 2024-08-22T20:44:17Z

When using a custom objective LightGBM sets the init score as 0 and if it doesn't find a gain with any split you may be left with a single tree with only the root, you can verify this if you use the trees_to_dataframe method.

If you're able to provide a reproducible example we can assist further. The following seems to train normally:

import lightgbm as lgb
import numpy as np
from sklearn.datasets import make_regression

def median_loss(preds, train_data: lgb.Dataset):
    y_true = train_data.get_label()
    residual = preds - y_true
    grad = np.where(residual >= 0, 0.5, -0.5)
    hess = np.ones_like(grad)  # Hessian is constant for median pinball loss
    return grad, hess

X, y = make_regression(n_samples=1000, n_features=2)
dtrain = lgb.Dataset(X, y)
params={"objective": median_loss, 'num_leaves': 32, 'verbosity': -1, 'metrics': 'l2'}
cv_hist = lgb.cv(
    params,
    dtrain,
    num_boost_round=10,
    nfold=2,
    stratified=False,
    callbacks=[lgb.log_evaluation(1)],
)
# [1]	cv_agg's valid l2: 15698.8 + 269.489
# [2]	cv_agg's valid l2: 15689.7 + 269.239
# [3]	cv_agg's valid l2: 15680.5 + 268.99
# [4]	cv_agg's valid l2: 15671.4 + 268.741
# [5]	cv_agg's valid l2: 15662.2 + 268.491
# [6]	cv_agg's valid l2: 15653.1 + 268.242
# [7]	cv_agg's valid l2: 15644 + 267.993
# [8]	cv_agg's valid l2: 15634.8 + 267.744
# [9]	cv_agg's valid l2: 15625.7 + 267.495
# [10]	cv_agg's valid l2: 15616.6 + 267.246

arumds · 2024-08-22T21:10:36Z

@jmoralez
Attached is a test dtrain binary file which can be used to reproduce as below:

dataset_from_file = lgb.Dataset(data="test.bin")

params={"objective": median_loss, 'num_leaves': 32, 'verbosity': -1, 'metrics': 'l2'}
cv_hist = lgb.cv(
    params,
    dataset_from_file,
    num_boost_round=10,
    nfold=2,
    stratified=False,
    callbacks=[lgb.log_evaluation(1)],
    seed=0,
    metrics='rmse',
    eval_train_metric=True,
    return_cvbooster=True)

test.bin.zip

Unzip the file to test.bin

jmoralez · 2024-08-23T03:18:46Z

Did you inspect the produced trees?

arumds · 2024-08-23T05:56:46Z

You mean to get the model from lgb.train after lgb.cv and inspect the trees? If so, yes there seem to be only root.

The hyper_params from the lgb.cv() and BayesianOptimization returns

`{'num_iterations': 500, 'early_stopping_rounds': 50, 'bagging_freq': 1, 'learning_rate': 0.01, 'verbosity': -1, 'monotone_constraints': [0, 0, 0, -1, 0, 1], 'objective': <function median_loss at 0x3126261f0>, 'bagging_fraction': 0.8646440511781974, 'feature_fraction': 0.9145568099117258, 'lambda_l1': 0.006027633760716439, 'lambda_l2': 0.005448831829968969, 'max_depth': 14, 'min_child_weight': 0.6394705825246829, 'min_data_in_leaf': 16, 'min_gain_to_split': 0.045670920031283195, 'num_leaves': 292}`

The model is trained with these hyper params and yields:

lgb.Booster.trees_to_dataframe(model)
Out[5]: 
   tree_index  node_depth node_index left_child right_child parent_index  \
0           0           1       0-L0       None        None         None   
  split_feature split_gain threshold decision_type missing_direction  \
0          None       None      None          None              None   
  missing_type  value weight count  
0         None      0   None  None

Does this indicate that the median_loss objective is not good for the dataset?

jmoralez · 2024-08-23T18:13:48Z

That means LightGBM isn't able to find a split that satisfies the constraints you've set (min_gain_to_split, min_data_in_leaf, min_child_weight, etc).

This doesn't seem to be an issue within LightGBM or your custom loss, I'm pretty sure you'd get the same result if you used the built-in loss (single tree with only the root which predicts the init score).

If you have very few samples you could try getting more data or reducing the constraints (in case 16 is your minimum min_data_in_leaf for example)

arumds · 2024-08-23T18:42:29Z

@jmoralez The hyper parameter boundaries for tuning are shown below:

hyperparam_boundaries = {'num_leaves': (100, 300),
                             'max_depth': (10, 20),
                             'feature_fraction': (0.7, 1),
                             'bagging_fraction': (0.7, 1),
                             'min_data_in_leaf': (10, 25),
                             'min_gain_to_split': (0.01,0.05),
                             'lambda_l1': (0, 0.01),
                             'lambda_l2': (0, 0.01)
                             }

And the built-in regression objective gives the best hyper parameters by bayesian hyper param tuning with lgb.cv() cross validation:

{'num_iterations': 500, 'early_stopping_rounds': 50, 'bagging_freq': 1, 'learning_rate': 0.01, 'verbosity': -1, 'monotone_constraints': [0, 0, 0, -1, 0, 1], 'objective': 'regression', 'bagging_fraction': 0.8150324556477333, 'feature_fraction': 0.9375175114247993, 'lambda_l1': 0.005288949197529045, 'lambda_l2': 0.0056804456109393235, 'max_depth': 19, 'min_child_weight': 0.07041859401008829, 'min_data_in_leaf': 11, 'min_gain_to_split': 0.010808735897613029, 'num_leaves': 266}

And there are >1 trees

lgb.Booster.trees_to_dataframe(model)
Out[2]: 
        tree_index  node_depth node_index  ...     value   weight  count
0                0           1       0-S0  ...  4.607710      0.0  66367
1                0           2       0-S2  ...  4.615160  29156.0  29156
2                0           3       0-S7  ...  4.616940  17398.0  17398
3                0           4      0-S18  ...  4.618880   2726.0   2726
4                0           5      0-S53  ...  4.621150    455.0    455
...            ...         ...        ...  ...       ...      ...    ...
265495         499          10   499-L241  ... -0.000076     20.0     20
265496         499          10   499-L256  ...  0.000423     11.0     11
265497         499           7   499-S254  ... -0.000418     25.0     25
265498         499           8    499-L38  ... -0.000174     12.0     12
265499         499           8   499-L255  ... -0.000677     13.0     13

The issue occurs only while using custom loss function where it cannot find a split and only predicts the init score 0.

arumds · 2024-08-26T16:09:23Z

@jmoralez is there anything i am missing out here?

jmoralez · 2024-08-26T16:32:22Z

What are you returning as the trial's score? As I said, when using a custom objective, LightGBM starts boosting from zero, which might hurt the convergence.

Can you try the approach in #5114 (comment) by setting the init score in your dataset (to the target's median in this case), adding it back to your predictions and then computing your metric on that? If you're using a built-in metric it won't work because it won't take into account the init scores.

jameslamb added the question label Aug 22, 2024

jmoralez added the awaiting response label Aug 22, 2024

github-actions bot removed the awaiting response label Aug 22, 2024

jmoralez added the awaiting response label Aug 22, 2024

github-actions bot removed the awaiting response label Aug 22, 2024

jmoralez added the awaiting response label Aug 23, 2024

github-actions bot removed the awaiting response label Aug 23, 2024

jmoralez added the awaiting response label Aug 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Custom median objective function in lightgbm.cv() #6620

Custom median objective function in lightgbm.cv() #6620

arumds commented Aug 22, 2024 •

edited

Loading

jmoralez commented Aug 22, 2024

arumds commented Aug 22, 2024 •

edited

Loading

jmoralez commented Aug 22, 2024

arumds commented Aug 22, 2024 •

edited

Loading

jmoralez commented Aug 23, 2024

arumds commented Aug 23, 2024 •

edited

Loading

jmoralez commented Aug 23, 2024 •

edited

Loading

arumds commented Aug 23, 2024

arumds commented Aug 26, 2024

jmoralez commented Aug 26, 2024

Custom median objective function in lightgbm.cv() #6620

Custom median objective function in lightgbm.cv() #6620

Comments

arumds commented Aug 22, 2024 • edited Loading

jmoralez commented Aug 22, 2024

arumds commented Aug 22, 2024 • edited Loading

jmoralez commented Aug 22, 2024

arumds commented Aug 22, 2024 • edited Loading

jmoralez commented Aug 23, 2024

arumds commented Aug 23, 2024 • edited Loading

jmoralez commented Aug 23, 2024 • edited Loading

arumds commented Aug 23, 2024

arumds commented Aug 26, 2024

jmoralez commented Aug 26, 2024

arumds commented Aug 22, 2024 •

edited

Loading

arumds commented Aug 22, 2024 •

edited

Loading

arumds commented Aug 22, 2024 •

edited

Loading

arumds commented Aug 23, 2024 •

edited

Loading

jmoralez commented Aug 23, 2024 •

edited

Loading