-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dt: disable core dumps in crash tracker tests #25117
base: dev
Are you sure you want to change the base?
dt: disable core dumps in crash tracker tests #25117
Conversation
I chose to set the core_pattern as it is the most reliable and simplest option for our current setup. Note that in our current CDT setup, core dumps are not written to a file directly by the kernel but instead use a piping mechanism where they pipe the coredump to `apport`. Because of this, the RLIMIT_CORE of the process (set through ulimit) is not respected, meaning ulimit alone cannot disable core dumps. For reference, see the "Piping core dumps to a program" section of https://www.man7.org/linux/man-pages/man5/core.5.html
CI test resultstest results on build#62030
|
@@ -57,6 +57,9 @@ def __init__(self, test_context): | |||
) | |||
self.broker = self.redpanda.nodes[0] | |||
|
|||
# Disable core dumps as they take a long time (>1min). Core dumps are uninteresting for this test, since this test intentionally trigger crashes. | |||
self.broker.account.ssh("sysctl -w kernel.core_pattern='|/dev/null'") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this have to be reverted at the end of the test?
Note that apport
is a giant piece of garbage so there is a point for disabling/removing it in our ansible altogether (at which point ulimit should work again).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this have to be reverted at the end of the test?
yeh i am concerned about running this on my local machine
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, that's fair. This PR is meant to be a way of disabling apport, but yeah, good point about this affecting local machines and other tests. Let me reach out to devprod to ask them to disable apport or move to a non-piping core_pattern
in CDT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool. yeh i'm not opposed to this, but just as a drive-by review it seems like there might be some other ways to approach this like a one-off test outside the normal test harnesses. but not immediately sure the best thing to do
Putting this to draft mode while I discuss with devprod an alternative approach of making a CDT-only change to core_pattern / disabling apport. |
Currently, the
CrashLoopChecksTest.test_crash_report_with_signal
ducktape tests consistently fail in CDT. This is because when a crash signal is sent to the redpanda process during the test, in CDT a core dump gets generated, which takes a long time (>1min) and causes the test to time out waiting for redpanda to stop (in 10s).To fix this, this PR disables core dumps for
CrashLoopChecksTest
ducktape tests.See the commit message for implementation details.
Fixes https://redpandadata.atlassian.net/browse/CORE-9044
Tested this on PR #25100
Backports Required
Release Notes