-
-
Notifications
You must be signed in to change notification settings - Fork 350
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make trio.lowlevel.checkpoint much faster #1613
Merged
Merged
+13
−2
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This is a bit of a kluge, and will hopefully be cleaned up in the future when we overhaul KeyboardInterrupt (python-triogh-733) and/or the cancellation checking APIs (python-triogh-961). But 'checkpoint()' is a common operation, and this speeds it up by ~7x, so... not bad for a ~4 line change. Before this change: - `await checkpoint()`: ~24k/second - `await cancel_shielded_checkpoint()`: ~180k/second After this change: - `await checkpoint()`: ~170k/second - `await cancel_shielded_checkpoint()`: ~180k/second Benchmark script: ```python import time import trio LOOP_SIZE = 1_000_000 async def main(): start = time.monotonic() for _ in range(LOOP_SIZE): await trio.lowlevel.checkpoint() #await trio.lowlevel.cancel_shielded_checkpoint() end = time.monotonic() print(f"{LOOP_SIZE / (end - start):.2f} schedules/second") trio.run(main) ```
Codecov Report
@@ Coverage Diff @@
## master #1613 +/- ##
=======================================
Coverage 99.69% 99.69%
=======================================
Files 110 110
Lines 13840 13840
Branches 1058 1059 +1
=======================================
Hits 13798 13798
Misses 27 27
Partials 15 15
|
njsmith
added a commit
to njsmith/trio
that referenced
this pull request
Jun 15, 2020
Profiling a user-written DNS microbenchmark in python-triogh-1595 showed that UDP sendto operations were spending a substantial amount of time in _resolve_address, which is responsible for resolving any hostname given in the sendto call. This is weird, since the benchmark wasn't using hostnames, just a raw IP address. Surprisingly, it turns out that calling getaddrinfo at all is also quite expensive, even you give it an already-resolved plain IP address so there's no work for it to do: In [1]: import socket In [2]: %timeit socket.getaddrinfo("127.0.0.1", 80) 5.84 µs ± 53.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [3]: %timeit socket.inet_aton("127.0.0.1") 187 ns ± 1.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) I thought getaddrinfo would be effectively free in this case, but apparently I was wrong. Also, this doesn't really matter for TCP connections, since they only pass through this path once per connection, but UDP goes through here on every packet, so the overhead adds up quickly. Solution: add a fast-path to _resolve_address when user's address is already resolved, so we skip all the work. On the benchmark in python-triogh-1595 on my laptop, this PR takes us from ~7000 queries/second to ~9000 queries/second, so a ~30% speedup. The patch is a bit more complicated than I expected. There are three parts: - The fast path itself - Skipping unnecessary checkpoints in _resolve_address, including when we're on the fastpath: this is an internal function and it turns out that almost all our callers are already doing checkpoints, so there's no need to do another checkpoint inside _resolve_address. Even with the fast checkpoints from python-triogh-1613, the fast-path+checkpoint is still only ~8000 queries/second on the DNS microbenchmark, versus ~9000 queries/second without the checkpoint. - _resolve_address used to always normalize IPv6 addresses into 4-tuples, as a side-effect of how getaddrinfo works. The fast path doesn't call getaddrinfo, so the tests needed adjusting so they don't expect this normalization, and to make sure that our tests for the getaddrinfo normalization code don't hit the fast path.
njsmith
added a commit
to njsmith/trio
that referenced
this pull request
Jun 15, 2020
Profiling a user-written DNS microbenchmark in python-triogh-1595 showed that UDP sendto operations were spending a substantial amount of time in _resolve_address, which is responsible for resolving any hostname given in the sendto call. This is weird, since the benchmark wasn't using hostnames, just a raw IP address. Surprisingly, it turns out that calling getaddrinfo at all is also quite expensive, even you give it an already-resolved plain IP address so there's no work for it to do: In [1]: import socket In [2]: %timeit socket.getaddrinfo("127.0.0.1", 80) 5.84 µs ± 53.6 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) In [3]: %timeit socket.inet_aton("127.0.0.1") 187 ns ± 1.5 ns per loop (mean ± std. dev. of 7 runs, 10000000 loops each) I thought getaddrinfo would be effectively free in this case, but apparently I was wrong. Also, this doesn't really matter for TCP connections, since they only pass through this path once per connection, but UDP goes through here on every packet, so the overhead adds up quickly. Solution: add a fast-path to _resolve_address when user's address is already resolved, so we skip all the work. On the benchmark in python-triogh-1595 on my laptop, this PR takes us from ~7000 queries/second to ~9000 queries/second, so a ~30% speedup. The patch is a bit more complicated than I expected. There are three parts: - The fast path itself - Skipping unnecessary checkpoints in _resolve_address, including when we're on the fastpath: this is an internal function and it turns out that almost all our callers are already doing checkpoints, so there's no need to do another checkpoint inside _resolve_address. Even with the fast checkpoints from python-triogh-1613, the fast-path+checkpoint is still only ~8000 queries/second on the DNS microbenchmark, versus ~9000 queries/second without the checkpoint. - _resolve_address used to always normalize IPv6 addresses into 4-tuples, as a side-effect of how getaddrinfo works. The fast path doesn't call getaddrinfo, so the tests needed adjusting so they don't expect this normalization, and to make sure that our tests for the getaddrinfo normalization code don't hit the fast path.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This is a bit of a kluge, and will hopefully be cleaned up in the
future when we overhaul KeyboardInterrupt (gh-733) and/or the
cancellation checking APIs (gh-961). But 'checkpoint()' is a common
operation, and this speeds it up by ~7x, so... not bad for a ~4 line
change.
Before this change:
await checkpoint()
: ~24k/secondawait cancel_shielded_checkpoint()
: ~180k/secondAfter this change:
await checkpoint()
: ~170k/secondawait cancel_shielded_checkpoint()
: ~180k/secondBenchmark script: