Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tls cert race/core 9178/v25.1.x #175

Open
wants to merge 6 commits into
base: v25.1.x
Choose a base branch
from

Conversation

michael-redpanda
Copy link

Fixes: CORE-9178

This addresses a possible race condition that some customers have witnessed where Seastar was not returning the correct certificate and this caused authorization failures on the broker. The fix is to use the correct OpenSSL API (SSL_get1_peer_certificate) rather than the cert held by the callback.

Note: We will backport this to v24.3 and v24.2 to address this bug in all OpenSSL implementations

@michael-redpanda michael-redpanda requested a review from a team February 21, 2025 20:37
@michael-redpanda michael-redpanda self-assigned this Feb 21, 2025
@michael-redpanda michael-redpanda requested review from pgellert and removed request for a team February 21, 2025 20:37
@michael-redpanda michael-redpanda force-pushed the tls-cert-race/core-9178/v25.1.x branch from 913465f to 96c1345 Compare February 21, 2025 20:44
@michael-redpanda
Copy link
Author

CI failures in two categories:

  • CPU profiler unit test flakiness (I believe Fixes for SG in cpu profile #174 is dealing with that)
  • Failure in building rpc_test.cc - code doesn't touch this so it's probably something that's already there

@StephanDollberg
Copy link
Member

Failure in building rpc_test.cc - code doesn't touch this so it's probably something that's already there

Should be fixed by #176

@michael-redpanda
Copy link
Author

By the end of our three PRs we may get one green run ;)

Created a new make_ossl_error function that takes in a pre-generated
list of OpenSSL error codes.  This is helpful if the error codes were
already pulled from the stack to be analyzed.

Created a helper function that drains the openssl error stack and
returns a list of errors.

Created a contains_ossl_error function that is used to analyze a list of
errors and determine if any match the library and error code.

Signed-off-by: Michael Boquard <[email protected]>
Sometimes the verification errors are not the first error in the OpenSSL
error stack.  This change will drain the error stack into a vector of
errors and then check if any of the errors match the certificate
verification error codes.  This will improve error messaging in
situations where the peer certificate fails to verify.

Signed-off-by: Michael Boquard <[email protected]>
This change should address a possible race condition when a using
application under heavy load attempts to view the DN of the peer's
(client's) certificate.

Once the handshake is successful, the function
SSL_get1_peer_certificate() should be used to obtain a copy of the
peer's certificate.  The verify_callback() mechanism is still used to
hold the last seen certificate that is being verified.  This is
necessary for error handling in order to inform a user application of
which certificate failed verification.

Customers have witnessed authorization failures because sometimes the
wrong certificate was returned purporting to be the client's
certificate.  This was traced to a possible race condition between
fetching the certificate and the callback function holding the wrong
certificate at the time.

Signed-off-by: Michael Boquard <[email protected]>
Without this change, our CI is testing against the GnuTLS
implementation.

Updated the install-dependencies script to include OpenSSL development
headers.

Signed-off-by: Michael Boquard <[email protected]>
More recent versions of OpenSSL are stricter about which extensions are
required.  Updated CA generation to use the `v3_ca` extensions found in
cert.cfg.in.

Signed-off-by: Michael Boquard <[email protected]>
@michael-redpanda michael-redpanda force-pushed the tls-cert-race/core-9178/v25.1.x branch from 3d00bdc to 96b4655 Compare February 22, 2025 00:29
@michael-redpanda
Copy link
Author

CI Failures:

  • Modules fails to build
  • cpu_profiler still flaky

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants