Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Benchmark] Add parquet read benchmark #1371

Merged
merged 13 commits into from
Aug 30, 2024

Conversation

rjzamora
Copy link
Member

@rjzamora rjzamora commented Jul 30, 2024

Adds new benchmark for parquet read performance using a LocalCUDACluster. The user can pass in --key and --secret options to specify S3 credentials.

E.g.

$ python ./local_read_parquet.py --devs 0,1,2,3,4,5,6,7 --filesystem fsspec --type gpu --file-count 48 --aggregate-files

Parquet read benchmark
--------------------------------------------------------------------------------
Path                      | s3://dask-cudf-parquet-testing/dedup_parquet
Columns                   | None
Backend                   | cudf
Filesystem                | fsspec
Blocksize                 | 244.14 MiB
Aggregate files           | True
Row count                 | 372066
Size on disk              | 1.03 GiB
Number of workers         | 8
================================================================================
Wall clock                | Throughput
--------------------------------------------------------------------------------
36.75 s                   | 28.78 MiB/s
21.29 s                   | 49.67 MiB/s
17.91 s                   | 59.05 MiB/s
================================================================================
Throughput                | 41.77 MiB/s +/- 7.81 MiB/s
Bandwidth                 | 0 B/s +/- 0 B/s
Wall clock                | 25.32 s +/- 8.20 s
================================================================================
...

Notes:

@rjzamora rjzamora added 2 - In Progress Currently a work in progress feature request New feature or request non-breaking Non-breaking change labels Jul 30, 2024
@rjzamora rjzamora self-assigned this Jul 30, 2024
@github-actions github-actions bot added the python python code needed label Jul 30, 2024
@pentschev
Copy link
Member

Performance generally scales with the number of workers (multiplied the number of threads per worker)

I'm assuming this apply to CPU-only operations, or are there CUDA kernels executed as part of this as well?

@rjzamora
Copy link
Member Author

I'm assuming this apply to CPU-only operations, or are there CUDA kernels executed as part of this as well?

This benchmark is entirely IO/CPU bound. There is effectively no CUDA compute - we are just transferring remote data into host memory and moving it into device memory (when the default --type gpu is used). Therefore, increasing threads_per_worker * n_workers typically improves performance (because we have more threads making connections and sending requests to S3).

@rjzamora rjzamora changed the title [WIP][Benchmark] Add new remote parquet benchmark [Benchmark] Add new remote parquet benchmark Aug 29, 2024
@rjzamora rjzamora changed the title [Benchmark] Add new remote parquet benchmark [Benchmark] Add parquet read benchmark Aug 29, 2024
@rjzamora rjzamora added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Aug 29, 2024
@rjzamora rjzamora marked this pull request as ready for review August 29, 2024 16:02
@rjzamora rjzamora requested a review from a team as a code owner August 29, 2024 16:02
@rjzamora
Copy link
Member Author

Update: I've generalized this benchmark. It's easy to use with S3 storage, but is also a useful benchmark for local-storage performance.

Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @rjzamora , I've left some comments.

dask_cuda/benchmarks/local_read_parquet.py Outdated Show resolved Hide resolved
dask_cuda/benchmarks/local_read_parquet.py Outdated Show resolved Hide resolved
dask_cuda/benchmarks/local_read_parquet.py Outdated Show resolved Hide resolved
dask_cuda/benchmarks/local_read_parquet.py Outdated Show resolved Hide resolved
Copy link
Member

@madsbk madsbk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice @rjzamora, looks good. I only have a minor suggestion.

dask_cuda/benchmarks/utils.py Outdated Show resolved Hide resolved
Copy link
Member

@pentschev pentschev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 to Mads' suggestion, otherwise LGTM. Thanks @rjzamora !

Co-authored-by: Mads R. B. Kristensen <[email protected]>
@rjzamora rjzamora added 5 - Ready to Merge Testing and reviews complete, ready to merge and removed 3 - Ready for Review Ready for review by team labels Aug 30, 2024
@rjzamora
Copy link
Member Author

/merge

@rapids-bot rapids-bot bot merged commit 1cc4d0b into rapidsai:branch-24.10 Aug 30, 2024
23 checks passed
@rjzamora rjzamora deleted the remote-io-bench branch August 30, 2024 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
5 - Ready to Merge Testing and reviews complete, ready to merge feature request New feature or request non-breaking Non-breaking change python python code needed
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

4 participants