optimize ReadBatch by moving memory allocation outside the loop rang… #81
+15
−2
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
My use case is reading and processing lots of UDP multicast data ~ 3.5Gbps,
over 2M Packets per second.
This being UDP, the kernel will deliver one packet per buffer.
(GRO helps, but not much)
Using ReadBatch from golang.org/x/net/ipv4 should reduce syscall overhead.
Benchmarking this, I found that is slower than just using Read()
The current implementation of mmsghdr.pack() does allocate slices for every
buffer in the inner loop, which, at a rate of 3M pps, is a big performance limit.
( Side note: Even with this patch, ReadBatch is only a win if there is enough data to read in each call. In a tight loop, where only a few buffers are returned in each call, the overhead involved will make things worse than just using Read().)
Changes:
Simply moving the allocation out of the innermost loop will reduce the allocations by a significant amount if len(Messages) is large
Downsides:
Slightly less efficient with very small BatchLengths. But in this case, ReadBatch is already quite inefficient and should not be used
Further work:
There are still allocations for every call in ReadBatch, which is unnecessary. Changing this will need interface changes, because system specific data (iovec etc) needs to be stored somewhere.