Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: The number of quantiles cannot be greater than the number of samples used #121

Closed
mgcyung opened this issue Jan 12, 2025 · 6 comments
Labels
bug Something isn't working

Comments

@mgcyung
Copy link

mgcyung commented Jan 12, 2025

Training worked for 100000 rows on TabPFNClassifier, while predicting didn't

train_size=200000
test_size=10000
clf = TabPFNClassifier(memory_saving_mode=16, fit_mode='low_memory', ignore_pretraining_limits=True)
clf.fit(X[:train_size,:].astype(float), y[:train_size])
predictions = clf.predict(X[-test_size:,:].astype(float))

And the error said

ValueError: The number of quantiles cannot be greater than the number of samples used. Got 20000 quantiles and 10000 samples.
@noahho
Copy link
Collaborator

noahho commented Jan 12, 2025

Hi! 200,000 samples will be hard to fit memory-wise. I'm not sure where your error here comes from, can you provide a fuller trace please?

Best!

@mgcyung
Copy link
Author

mgcyung commented Jan 13, 2025

Here is the trace

Traceback (most recent call last):
  File "/home/test/tabpfn_test.py", line 73, in <module>
    predictions = clf.predict(X[-test_size:,:].astype(float))
  File "/home/test/.local/lib/python3.10/site-packages/tabpfn/classifier.py", line 512, in predict
    proba = self.predict_proba(X)
  File "/home/test/.local/lib/python3.10/site-packages/tabpfn/classifier.py", line 533, in predict_proba
    for output, config in self.executor_.iter_outputs(
  File "/home/test/.local/lib/python3.10/site-packages/tabpfn/inference.py", line 163, in iter_outputs
    for config, preprocessor, X_train, y_train, cat_ix in itr:
  File "/home/test/.local/lib/python3.10/site-packages/tabpfn/preprocessing.py", line 632, in fit_preprocessing
    yield from executor(  # type: ignore
  File "/home/test/.local/lib/python3.10/site-packages/joblib/parallel.py", line 1792, in _get_sequential_output
    res = func(*args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/tabpfn/preprocessing.py", line 539, in fit_preprocessing_one
    res = preprocessor.fit_transform(X_train, cat_ix)
  File "/home/test/.local/lib/python3.10/site-packages/tabpfn/model/preprocessing.py", line 397, in fit_transform
    X, categorical_features = step.fit_transform(X, categorical_features)
  File "/home/test/.local/lib/python3.10/site-packages/tabpfn/model/preprocessing.py", line 987, in fit_transform
    Xt = transformer.fit_transform(X[:, self.subsampled_features_])
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/pipeline.py", line 471, in fit_transform
    Xt = self._fit(X, y, **fit_params_steps)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/pipeline.py", line 377, in _fit
    X, fitted_transformer = fit_transform_one_cached(
  File "/home/test/.local/lib/python3.10/site-packages/joblib/memory.py", line 353, in __call__
    return self.func(*args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/pipeline.py", line 957, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py", line 754, in fit_transform
    result = self._fit_transform(X, y, _fit_transform_one)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/compose/_column_transformer.py", line 681, in _fit_transform
    return Parallel(n_jobs=self.n_jobs)(
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/utils/parallel.py", line 65, in __call__
    return super().__call__(iterable_with_config)
  File "/home/test/.local/lib/python3.10/site-packages/joblib/parallel.py", line 1863, in __call__
    return output if self.return_generator else list(output)
  File "/home/test/.local/lib/python3.10/site-packages/joblib/parallel.py", line 1792, in _get_sequential_output
    res = func(*args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/utils/parallel.py", line 127, in __call__
    return self.function(*args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/pipeline.py", line 957, in _fit_transform_one
    res = transformer.fit_transform(X, y, **fit_params)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 157, in wrapped
    data_to_wrap = f(self, X, *args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/base.py", line 916, in fit_transform
    return self.fit(X, **fit_params).transform(X)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/base.py", line 1152, in wrapper
    return fit_method(estimator, *args, **kwargs)
  File "/home/test/.local/lib/python3.10/site-packages/sklearn/preprocessing/_data.py", line 2653, in fit
    raise ValueError(
ValueError: The number of quantiles cannot be greater than the number of samples used. Got 20000 quantiles and 10000 samples.

@ChenJin1110
Copy link

Hi, noahho!

I encountered the same error when using the training set of more than 10,000 rows. My x_train.shape=(273835, 85).

Is there any solution?

ValueError: The number of quantiles cannot be greater than the number of samples used. Got 54767 quantiles and 10000 samples.

@LeoGrin LeoGrin added the bug Something isn't working label Jan 22, 2025
@ljubomirj
Copy link

Similar for me - sizes

X: (4542148, 15) Y: (4542148, 1)

running

model = TabPFNRegressor(ignore_pretraining_limits = True)
model.fit(X, Y)

got error

...........................................................................................
File "/home/ljubomir/python3-venv/torch/lib/python3.10/site-packages/joblib/parallel.py", line 1847, in _get_sequential_output
res = func(*args, **kwargs)
File "/home/ljubomir/python3-venv/torch/lib/python3.10/site-packages/sklearn/utils/parallel.py", line 136, in call
return self.function(*args, **kwargs)
File "/home/ljubomir/python3-venv/torch/lib/python3.10/site-packages/sklearn/pipeline.py", line 1310, in _fit_transform_one
res = transformer.fit_transform(X, y, **params.get("fit_transform", {}))
File "/home/ljubomir/python3-venv/torch/lib/python3.10/site-packages/sklearn/utils/_set_output.py", line 316, in wrapped
data_to_wrap = f(self, X, *args, **kwargs)
File "/home/ljubomir/python3-venv/torch/lib/python3.10/site-packages/sklearn/base.py", line 1098, in fit_transform
return self.fit(X, **fit_params).transform(X)
File "/home/ljubomir/python3-venv/torch/lib/python3.10/site-packages/sklearn/base.py", line 1473, in wrapper
return fit_method(estimator, *args, **kwargs)
File "/home/ljubomir/python3-venv/torch/lib/python3.10/site-packages/sklearn/preprocessing/_data.py", line 2775, in fit
raise ValueError(
ValueError: The number of quantiles cannot be greater than the number of samples used. Got 908429 quantiles and 10000 samples.

@1511878618
Copy link

Same errors:

ValueError: The number of quantiles cannot be greater than the number of samples used. Got 45374 quantiles and 10000 samples.

@LeoGrin
Copy link
Collaborator

LeoGrin commented Feb 17, 2025

Duplicate of #169

@LeoGrin LeoGrin marked this as a duplicate of #169 Feb 17, 2025
@LeoGrin LeoGrin closed this as completed Feb 17, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

6 participants