-
Notifications
You must be signed in to change notification settings - Fork 95
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Worse performance after parsing 2 large values #364
Comments
I think there may be an accidentally quadratic behavior here, where when you run it using an actual IO device the data comes in chunks and every time we read a new chunk we go through HypothesisWhen reading the first value we end up incrementally increasing the size of the buffer, extending it by This doesn't show when reading straight from from a byte buffer because the data is available in full, so as we get to the second value we only do a single call to FixesI can think of three at the top of my head.
I'd probably prefer point 2, but I may need a little before I have time for it. If you want to try something easy, 1 would be easy to patch in to test the hypothesis. If you are able to produce a minimized example I can add to combine's tests or benchmarks that would be appreciated the partial-io crate used in combine can be used to simulate reading data in chunks while still using in memory data. |
I don't know if it helps, but the initial report was about async redis-rs.
Would a flamegraph from an async version help?
I would be happy to try and make a fix, but I'm not sure how to
simulate/recreate this behavior in a test. Do you have any suggestions?
…On Thu, 4 Apr 2024, 16:28 Markus Westerlind, ***@***.***> wrote:
I think there may be an accidentally quadratic behavior here, where when
you run it using an actual IO device the data comes in chunks and every
time we read a new chunk we go through extend_buf_sync we end up
initializing the buffer againbefore passing it to the std::io::Read.
Hypothesis
When reading the first value we end up incrementally increasing the size
of the buffer, extending it by8 * 1024 bytes every time it needs to grow
when we read a new chunk of data. But for any values following that the
buffer is already huge! While we don't need to grow the buffer again, we
still end up initializing the entire uninitialized part of the buffer (it
is uninitialized as far as the code is aware, but it is actually
initialized from the previous value). So when we again read the data coming
in chunks we get quadratic behaviour from repeatedly initializing the huge
buffer.
This doesn't show when reading straight from from a byte buffer because
the data is available in full, so as we get to the second value we only do
a single call to extend_buf_sync/read and that fills the buffer with the
entire value all at once.
Fixes
I can think of three at the top of my head.
- Cap how much we initialize (and thus are able to read) in each call
to extend_buf_sync/read
- Very easy, but will still have overhead
- Track how much of the buffer is initialized from previous calls and
avoid doing the initialization on this again
- Ideal, more or less, given the APIs std gives us at this time.
But it does involve more unsafe since we must give
std::io::Read::read an initialized buffer.
- Wait for
https://doc.rust-lang.org/std/io/trait.Read.html#method.read_buf to
stabilize
- This basically mirrors the API tokio (and consequentally redis-rs
and combine's async versions uses). This API allows the initialization step
be omitted entirely without unsafe (at this level, at the lower levels
there are still unsafe)
I'd probably prefer point 2, but I may need a little before I have time
for it. If you want to try something easy, 1 would be easy to patch in to
test the hypothesis. If you are able to produce a minimized example I can
add to combine's tests or benchmarks that would be appreciated the
partial-io crate used in combine can be used to simulate reading data in
chunks while still using in memory data.
—
Reply to this email directly, view it on GitHub
<#364 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABDEBBSBYHFYFMSIO4PZ6HDY3VIQ7AVCNFSM6AAAAABFXGXHOCVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDAMZXGIZDQNJTHA>
.
You are receiving this because you authored the thread.Message ID:
***@***.***>
|
Didn't see that, the async version wouldn't call |
You could try this branch to see if it helps (implements point 1) https://github.com/Marwes/combine/tree/perf_extend_buf_sync |
No, you're right - this doesn't happen in the async non-TLS runs, only in async TLS runs. Thanks for correcting me! edit: or, at least I can't repro it in async non-TLS. here redis-rs/redis-rs#1128 (comment) the original writer claims that he sees the same. |
It helps! Thanks for the quick implementation. |
@Marwes Can I create pr with your change (point 1) with additional testing using |
@artemrakov Go ahead |
it will likely be a performance regression if the reader could actually fill the entire buffer that were initialized, but that would just be a constant factor so especially for a temporary workaround, that is fine. |
See this discussion in the redis-rs repo: redis-rs/redis-rs#1128 (comment)
The redis-rs parser receives a large value (the non-TLS example uses 500,000,000 bytes) from a TCP stream, and parses it in a set time. After this first call, consecutive calls, with exactly the same value, will take ~twice as much time to parse the value.
flamegraph
analysis put the blame oncombine::stream::buf_reader::extend_buf_sync
.This behavior isn't reproduced when instead of a TCP stream I use a
Vec<u8>
orBytes
objects with the value.I have very little experience with using
combine
, and I know nothing about its internals. I'd appreciate any help with understanding whether the issue is with our usage ofcombine
, or an actual issue with the library.Here is the redis-rs parser code.
The text was updated successfully, but these errors were encountered: