Feature Request: add timeout parameter to the .fit() method #6596

fingoldo · 2024-08-08T09:48:06Z

Adding the timeout parameter to the .fit() method, that should force the library to return best known solution found so far as soon as provided number of seconds since the start of training are passed, will allow to satisfy training SLAs, when a user has only a limited time budget to finish certain model training. Also, this will make possible fair comparison of different hyperparameters.

Reaching the timeout should have the same effect as reaching max iterations, maybe with additional warning and/or attribute set so that the training job's finishing reason is clear to the end user.

jameslamb · 2024-08-08T13:37:25Z

Thanks for using LightGBM and taking the time to open this.

I'm -1 on adding this to LightGBM. I understand why this might be useful, but I don't think LightGBM is the right place for this logic. This would introduce some non-trivial maintenance burden and complexity.

This would be better handled outside of LightGBM, in code that you use to invoke it.

Since you mentioned .fit(), I assume you're specifically talking about using lightgbm (the Python package for LightGBM). You could, for example, use asyncio's builtin support for timing out Python function calls: https://docs.python.org/3/library/asyncio-task.html#timeouts.

Alternatively, you could use a lightgbm callback for this purpose. Something like the following:

import lightgbm as lgb
from datetime import datetime
from sklearn.datasets import make_regression

X, y = make_regression(n_samples=10_000, n_features=20)
dtrain = lgb.Dataset(X, label=y)

class TimeoutCallback:
    def __init__(self, timeout_seconds: int):
        self.before_iteration = False
        self.timeout_seconds = timeout_seconds
        self._start = datetime.utcnow()

    def __call__(self, *args, **kwargs) -> None:
        if (datetime.utcnow() - self._start).total_seconds() > self.timeout_seconds:
            raise RuntimeError(f"timing out: elapsed time has exceeded {self.timeout_seconds} seconds")

bst = lgb.train(
    params={
        "objective": "regression",
        "num_leaves": 100
    },
    train_set=dtrain,
    num_boost_round=1000,
    callbacks=[TimeoutCallback(2)]
)

I just tested that with LightGBM 4.5.0 and saw the following:

[LightGBM] [Info] Auto-choosing col-wise multi-threading, the overhead of testing was 0.001736 seconds.
You can set `force_col_wise=true` to remove the overhead.
[LightGBM] [Info] Total Bins 5100
[LightGBM] [Info] Number of data points in the train set: 10000, number of used features: 20
[LightGBM] [Info] Start training from score 0.256686
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/jlamb/miniforge3/envs/lgb-dev/lib/python3.11/site-packages/lightgbm/engine.py", line 317, in train
    cb(
  File "<stdin>", line 8, in __call__
RuntimeError: timing out: elapsed time has exceeded 2 seconds

That's not perfect, as it only runs after each iteration and individual iterations could run for much longer on a realistic dataset. But hopefully that imperfection also shows one example of how complex this would be to implement in LightGBM.

I'm only one vote here though, maybe other maintainers will have a different perspective.

fingoldo · 2024-08-08T16:33:28Z

I did not think of this approach! If i'm using early stopping, are best "weights" applied to the model after this exception is thrown? in other words, is best_iter set correctly? Goal would be to stay within time budget but not lose training progress up to the point.

jameslamb · 2024-08-08T16:44:15Z

Oh interesting! It wasn't clear to me that you would want to see training time out but also keep that model.

No, in the Python package best_iter and other early stopping behavior is only set after early stopping is explicitly triggered, not along the way as training proceeds.

A Python exception is used to tell the training process that early stopping has been triggered, and to carry forward details like best iteration and evaluation results.

LightGBM/python-package/lightgbm/callback.py

Line 436 in e7edb6c

raise EarlyStopException(self.best_iter[i], self.best_score_list[i])

LightGBM/python-package/lightgbm/callback.py

Lines 40 to 44 in e7edb6c

    
           class EarlyStopException(Exception): 
        
               """Exception of early stopping. 
        
               Raise this from a callback passed in via keyword argument ``callbacks`` 
        
               in ``cv()`` or ``train()`` to trigger early stopping.

LightGBM/python-package/lightgbm/engine.py

Lines 327 to 330 in e7edb6c

    
           except callback.EarlyStopException as earlyStopException: 
        
               booster.best_iteration = earlyStopException.best_iteration + 1 
        
               evaluation_result_list = earlyStopException.best_score 
        
               break

You could rely on that behavior in your own callback, and have it raise a lightgbm.EarlyStopException instead of a RuntimeError like in my example. That'd allow you to treat "training has been running for too long" as a triggering condition for early stopping.

Alternatively... have you tried optuna? I haven't used this particular feature of it, but it looks like they directly offer a time_budget: https://optuna.readthedocs.io/en/v2.0.0/reference/generated/optuna.integration.lightgbm.LightGBMTuner.html

time_budget – A time budget for parameter tuning in seconds.

(that might be for the entire experiment though, not per-trial... I'm not sure)

fingoldo · 2024-08-08T17:25:46Z

Hah! ) I'm planning to create my own cool hyperparameters tuner, that's one of the reasons why I'm interested in this functionality. I can easily see how to do time budgeting at the level of the tuner - just in the hyperparameters checking loop, after next combination has been tried, but the underlying estimator has to finish its training gracefully before that, which for some combinations can take extremely long time.

Writing great hyperparameters optimizer is one more use case for this timeout feature. Now I think it's the EarlyStopping callback I should subclass (as I almost can't imagine training without early stopping).

Does it make sense to prepare a PR that adds timeout parameter to the EarlyStopping callback?

That said, it still seems more natural to me to be able to specify timeout in the fit or init methods of the estimator directly, same as we do with n_iters, just in this case we are interested in maximum number of seconds not trees.

jameslamb added the feature request label Aug 8, 2024

jameslamb added the awaiting response label Aug 8, 2024

github-actions bot removed the awaiting response label Aug 8, 2024

This was referenced Aug 9, 2024

Feature Request: add timeout parameter to the .fit() method dmlc/xgboost#10684

Open

Feature Request: add timeout parameter to the .fit() method catboost/catboost#2717

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: add timeout parameter to the .fit() method #6596

Feature Request: add timeout parameter to the .fit() method #6596

fingoldo commented Aug 8, 2024

jameslamb commented Aug 8, 2024

fingoldo commented Aug 8, 2024

jameslamb commented Aug 8, 2024 •

edited

Loading

fingoldo commented Aug 8, 2024 •

edited

Loading

Feature Request: add timeout parameter to the .fit() method #6596

Feature Request: add timeout parameter to the .fit() method #6596

Comments

fingoldo commented Aug 8, 2024

jameslamb commented Aug 8, 2024

fingoldo commented Aug 8, 2024

jameslamb commented Aug 8, 2024 • edited Loading

fingoldo commented Aug 8, 2024 • edited Loading

jameslamb commented Aug 8, 2024 •

edited

Loading

fingoldo commented Aug 8, 2024 •

edited

Loading