ValueError: The number of quantiles cannot be greater than the number of samples used #121

mgcyung · 2025-01-12T13:55:03Z

Training worked for 100000 rows on TabPFNClassifier, while predicting didn't

train_size=200000
test_size=10000
clf = TabPFNClassifier(memory_saving_mode=16, fit_mode='low_memory', ignore_pretraining_limits=True)
clf.fit(X[:train_size,:].astype(float), y[:train_size])
predictions = clf.predict(X[-test_size:,:].astype(float))

And the error said

ValueError: The number of quantiles cannot be greater than the number of samples used. Got 20000 quantiles and 10000 samples.

The text was updated successfully, but these errors were encountered:

noahho · 2025-01-12T14:07:16Z

Hi! 200,000 samples will be hard to fit memory-wise. I'm not sure where your error here comes from, can you provide a fuller trace please?

Best!

mgcyung · 2025-01-13T02:05:18Z

Here is the trace

Traceback (most recent call last):
  File "/home/test/tabpfn_test.py", line 73, in <module>
    predictions = clf.predict(X[-test_size:,:].astype(float))
  File "/home/test/.local/lib/python3.10/site-packages/tabpfn/classifier.py", line 512, in predict
    proba = self.predict_proba(X)
  File "/home/test/.local/lib/python3.10/site-packages/tabpfn/classifier.py", line 533, in predict_proba
    for output, config in self.executor_.iter_outputs(
  File "/home/test/.local/lib/python3.10/site-packages/tabpfn/inference.py", line 163, in iter_outputs
    for config, preprocessor, X_train, y_train, cat_ix in itr:
  File "/home/test/.local/lib/python3.10/site-packages/tabpfn/preprocessing.py", line 632, in fit_preprocessing
    yield from executor(  # type: ignore
  File "/home/test/.local/lib/python3.10/site-packages/joblib/parallel.py", line 1792, in _get_sequential_output
    res = func(*args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/tabpfn/preprocessing.py", line 539, in fit_preprocessing_one
    res = preprocessor.fit_transform(X_train, cat_ix)
  File "/home/test/.local/lib/python3.10/site-packages/tabpfn/model/preprocessing.py", line 397, in fit_transform
    X, categorical_features = step.fit_transform(X, categorical_features)
  File "/home/test/.local/lib/python3.10/site-packages/tabpfn/model/preprocessing.py", line 987, in fit_transform
    Xt = transformer.fit_transform(X[:, self.subsampled_features_])
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/pipeline.py", line 471, in fit_transform
    Xt = self._fit(X, y, **fit_params_steps)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/pipeline.py", line 377, in _fit
    X, fitted_transformer = fit_transform_one_cached(
  File "/home/test/.local/lib/python3.10/site-packages/joblib/memory.py", line 353, in __call__
    return self.func(*args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/pipeline.py", line 957, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py", line 754, in fit_transform
    result = self._fit_transform(X, y, _fit_transform_one)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py", line 681, in _fit_transform
    return Parallel(n_jobs=self.n_jobs)(
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/utils/parallel.py", line 65, in __call__
    return super().__call__(iterable_with_config)
  File "/home/test/.local/lib/python3.10/site-packages/joblib/parallel.py", line 1863, in __call__
    return output if self.return_generator else list(output)
  File "/home/test/.local/lib/python3.10/site-packages/joblib/parallel.py", line 1792, in _get_sequential_output
    res = func(*args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/utils/parallel.py", line 127, in __call__
    return self.function(*args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/pipeline.py", line 957, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/base.py", line 916, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/preprocessing/_data.py", line 2653, in fit
    raise ValueError(
ValueError: The number of quantiles cannot be greater than the number of samples used. Got 20000 quantiles and 10000 samples.

ChenJin1110 · 2025-01-20T07:34:14Z

Hi, noahho!

I encountered the same error when using the training set of more than 10,000 rows. My x_train.shape=(273835, 85).

Is there any solution?

ValueError: The number of quantiles cannot be greater than the number of samples used. Got 54767 quantiles and 10000 samples.

ljubomirj · 2025-01-25T18:35:11Z

Similar for me - sizes

X: (4542148, 15) Y: (4542148, 1)

running

model = TabPFNRegressor(ignore_pretraining_limits = True)
model.fit(X, Y)

got error

...........................................................................................
File "/home/ljubomir/python3-venv/torch/lib/python3.10/site-packages/joblib/parallel.py", line 1847, in _get_sequential_output
res = func(*args, **kwargs)
File "/home/ljubomir/python3-venv/torch/lib/python3.10/site-packages/sklearn/utils/parallel.py", line 136, in call
return self.function(*args, **kwargs)
File "/home/ljubomir/python3-venv/torch/lib/python3.10/site-packages/sklearn/pipeline.py", line 1310, in _fit_transform_one
res = transformer.fit_transform(X, y, **params.get("fit_transform", {}))
File "/home/ljubomir/python3-venv/torch/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 316, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/home/ljubomir/python3-venv/torch/lib/python3.10/site-packages/sklearn/base.py", line 1098, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/home/ljubomir/python3-venv/torch/lib/python3.10/site-packages/sklearn/base.py", line 1473, in wrapper
return fit_method(estimator, *args, **kwargs)
File "/home/ljubomir/python3-venv/torch/lib/python3.10/site-packages/sklearn/preprocessing/_data.py", line 2775, in fit
raise ValueError(
ValueError: The number of quantiles cannot be greater than the number of samples used. Got 908429 quantiles and 10000 samples.

1511878618 · 2025-02-12T11:08:18Z

Same errors:

ValueError: The number of quantiles cannot be greater than the number of samples used. Got 45374 quantiles and 10000 samples.

LeoGrin · 2025-02-17T16:44:58Z

Duplicate of #169

LeoGrin added the bug Something isn't working label Jan 22, 2025

LeoGrin marked this as a duplicate of #169 Feb 17, 2025

LeoGrin closed this as completed Feb 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ValueError: The number of quantiles cannot be greater than the number of samples used #121

ValueError: The number of quantiles cannot be greater than the number of samples used #121

mgcyung commented Jan 12, 2025

noahho commented Jan 12, 2025

mgcyung commented Jan 13, 2025

ChenJin1110 commented Jan 20, 2025

ljubomirj commented Jan 25, 2025

1511878618 commented Feb 12, 2025

LeoGrin commented Feb 17, 2025

ValueError: The number of quantiles cannot be greater than the number of samples used #121

ValueError: The number of quantiles cannot be greater than the number of samples used #121

Comments

mgcyung commented Jan 12, 2025

noahho commented Jan 12, 2025

mgcyung commented Jan 13, 2025

ChenJin1110 commented Jan 20, 2025

ljubomirj commented Jan 25, 2025

1511878618 commented Feb 12, 2025

LeoGrin commented Feb 17, 2025