Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementing TPC-H #271

Open
amueller opened this issue Sep 28, 2023 · 4 comments
Open

Implementing TPC-H #271

amueller opened this issue Sep 28, 2023 · 4 comments

Comments

@amueller
Copy link

Has anyone thought about implementing TPC-H using the dataframe API?
I think this would be very useful to test the scope, and also to draw attention to the dataframe API.
It would mean that anyone implementing the dataframe API could immediately get an apples-to-apples benchmark of performance.

Whether TPC-H is a good benchmark for dataframes is maybe not entirely clear, but it's the best there is right now AFAIK.

If we can make it so that polars, modin and duckdb run their comparisons via the dataframe API, I think that would be pretty sweet.

You can see the polars implementation of TPC-H here:
https://github.com/pola-rs/tpch
results here:
https://www.pola.rs/benchmarks.html

@MarcoGorelli
Copy link
Contributor

Great suggestion!

I'll try this out and see how far I get, it'll likely highlight some missing areas

It would require that the dataframe-api would have to be as close to a zero-cost abstraction as possible - #249 would bring us a lot closer to that goal, so if you had any input there I'd really appreciate it

thanks 🙏

@amueller
Copy link
Author

amueller commented Sep 29, 2023

Maybe should be tpc-ds or maybe tpcx-xBB

@MarcoGorelli
Copy link
Contributor

We're added a couple of tpc-h examples here:

https://github.com/data-apis/dataframe-api/tree/main/spec/API_specification/examples/tpch

@MarcoGorelli
Copy link
Contributor

Narwhals supports running all 22 TPC-H queries across supported backends (pandas, Polars, PyArrow, Dask, cuDF, Modin)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants