Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid possible memory leak in compress middleware #919

Merged
merged 1 commit into from
Jun 7, 2024

Conversation

Neurostep
Copy link
Contributor

Description

In one of the HTTP services I am working on, we use the compress middleware provided by chi. We use standard gzip, deflate compression alongside with brotli compression.

The important aspect of the HTTP endpoint that uses the compression middleware is that the content may or may not be of a compressible type, serving json and HTML alongside png and jpg.

For the brotli compression we use the official CGO wrapper. The following code snippet is used to set the encoder:

compressor := middleware.NewCompressor(flate.DefaultCompression)

compressor.SetEncoder("br", func(w io.Writer, _ int) io.Writer {
	return cbrotli.NewWriter(w, cbrotli.WriterOptions{Quality: 5})
})

After enabling the brotli compression in the production environment, we noticed a memory leak in the service although there was no any issues with gzip compression.

Screenshot 2024-06-07 at 14 36 48

After the investigation, we noticed no memory leak indicated by the Go runtime profiles. After this we started looking into the brotli library implementation.

Instrumented with the bcc tools (BPF Compiler Collection) we were able to find the leak:

[09:29:50] Top 10 stacks with outstanding allocations:
        655744 bytes in 94 allocations from stack
                0x00007a71453ecb3a      BrotliEncoderCreateInstance+0x30a [libbrotlienc.so.1.1.0]
                0x0000000001072263      _cgo_eb62034131f6_Cfunc_BrotliEncoderCreateInstance+0x23 [server]
                0x000000000040a755      runtime.cgocall+0x75 [server]
                0x000000000106956c      github.com/google/brotli/go/cbrotli._Cfunc_BrotliEncoderCreateInstance.abi0+0x4c [server]
                0x0000000001069919      github.com/google/brotli/go/cbrotli.NewWriter+0x39 [server]

It is clear from the report we are leaking the memory instantiated by the brotli writer. Looking at the source code of the compress middleware, we should call the underlying Close of the CGO writer which in turn should free up the C-allocated memory. We checked if the BrotliEncoderDestroyInstance is being called using the funccount-bpfcc:

sudo funccount-bpfcc '/usr/lib/x86_64-linux-gnu/libbrotlienc.so.1.1.0:BrotliEncoder*'
Tracing 13 functions for "b'/usr/lib/x86_64-linux-gnu/libbrotlienc.so.1.1.0:BrotliEncoder*'"... Hit Ctrl-C to end.
^C
FUNC                                    COUNT
BrotliEncoderDestroyInstance             4311
BrotliEncoderCreateInstance              5760
BrotliEncoderSetParameter                5760
BrotliEncoderCompressStream              8632
BrotliEncoderHasMoreOutput               8632
BrotliEncoderTakeOutput                  8632

Turned out, the BrotliEncoderDestroyInstance function is not being called at the same rate as BrotliEncoderCreateInstance which explains the leak.

After carefully looking into the go-chi middleware I realized we are calling the Close function returned by the writer() function, and writer() function returns instantiated writer only if the content is compressible.

In this PR we are trying to close the underlying Writer if it satisfies the WriteCloser interface, no matter if the content being served is compressible or not.

Related links

@Neurostep Neurostep marked this pull request as ready for review June 7, 2024 12:56
Copy link
Contributor

@VojtechVitek VojtechVitek left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Thanks for the detailed report and additional test cases. Really nice work!

@pkieltyka pkieltyka merged commit f10dc4a into go-chi:master Jun 7, 2024
16 checks passed
@Neurostep Neurostep deleted the Neurostep/fix-memory-leak branch June 8, 2024 07:38
@Neurostep
Copy link
Contributor Author

@VojtechVitek @pkieltyka Thanks for the review and merging the PR 🙏

Do we have a rough estimation of when the new release containing these changes will be created? We would love to pull the fix as soon as possible. Thinking now if we should point to the commit hash from the master branch. Otherwise waiting for 1-2 weeks is OK.

@VojtechVitek
Copy link
Contributor

@Neurostep can you go get github.com/go-chi/chi/v5@latest and report back if the issues is gone in master branch, please? We'll release a new version right after you confirm it fixed the issue? :)

@Neurostep
Copy link
Contributor Author

@VojtechVitek That makes sense 👍 We will check changes from the master branch and will get back to you with the results.

@Neurostep
Copy link
Contributor Author

@VojtechVitek Just following up here. We have deployed the latest version of the go-chi/chi to our production version of the service and we are no longer seeing the memory leak. Hereby, confirming the fix working 🚀

Thanks again for the reviews and quick turnaround 🙏

@VojtechVitek
Copy link
Contributor

Hi @Neurostep, this is great! Thanks you for reporting back!

I'll release v5.0.13 shortly. 🎉

@adrian-bl
Copy link

I think something might be wrong with f10dc4a

Using the compression middleware, i get 'extra bytes' at the end of the document:

$ curl 'http://localhost:8004/' -H 'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,*/*;q=0.8' -H 'Accept-Language: en-US,en;q=0.5' -H 'Accept-Encoding: gzip, deflate, br, zstd' -H 'Connection: keep-alive' -H 'Upgrade-Insecure-Requests: 1' -H 'Sec-Fetch-Dest: document' -H 'Sec-Fetch-Mode: navigate' -H 'Sec-Fetch-Site: none' -H 'Sec-Fetch-User: ?1' -H 'Priority: u=1' -H 'Pragma: no-cache' -H 'Cache-Control: no-cache'|hexdump -C
....
00002fd0  28 29 20 3d 3e 20 74 68  69 73 2e 6c 61 75 6e 63  |() => this.launc|
00002fe0  68 53 65 61 72 63 68 28  27 27 29 2c 20 30 29 3b  |hSearch(''), 0);|
00002ff0  0a 09 7d 0a 20 20 20 20  7d 29 3b 0a 20 20 20 20  |..}.    });.    |
00003000  3c 2f 73 63 72 69 70 74  3e 0a 20 20 3c 2f 62 6f  |</script>.  </bo|
00003010  64 79 3e 0a 3c 2f 68 74  6d 6c 3e 0a 1f 8b 08 00  |dy>.</html>.....|
00003020  00 00 00 00 00 ff 01 00  00 ff ff 00 00 00 00 00  |................|
00003030  00 00 00                                          |...|

This does not happen if i rebuild the binary with 5.0.12 (or drop the compression middleware)

00002fe0  68 53 65 61 72 63 68 28  27 27 29 2c 20 30 29 3b  |hSearch(''), 0);|
00002ff0  0a 09 7d 0a 20 20 20 20  7d 29 3b 0a 20 20 20 20  |..}.    });.    |
00003000  3c 2f 73 63 72 69 70 74  3e 0a 20 20 3c 2f 62 6f  |</script>.  </bo|
00003010  64 79 3e 0a 3c 2f 68 74  6d 6c 3e 0a              |dy>.</html>.|
0000301c

@VojtechVitek
Copy link
Contributor

Reverting in #924 due to #923. I wonder if someone could figure out the reason and write additional test cases 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants