dt: disable core dumps in crash tracker tests #25117

pgellert · 2025-02-19T16:17:12Z

Currently, the CrashLoopChecksTest.test_crash_report_with_signal ducktape tests consistently fail in CDT. This is because when a crash signal is sent to the redpanda process during the test, in CDT a core dump gets generated, which takes a long time (>1min) and causes the test to time out waiting for redpanda to stop (in 10s).

To fix this, this PR disables core dumps for CrashLoopChecksTest ducktape tests.

See the commit message for implementation details.

Fixes https://redpandadata.atlassian.net/browse/CORE-9044
Tested this on PR #25100

Backports Required

Release Notes

none

I chose to set the core_pattern as it is the most reliable and simplest option for our current setup. Note that in our current CDT setup, core dumps are not written to a file directly by the kernel but instead use a piping mechanism where they pipe the coredump to `apport`. Because of this, the RLIMIT_CORE of the process (set through ulimit) is not respected, meaning ulimit alone cannot disable core dumps. For reference, see the "Piping core dumps to a program" section of https://www.man7.org/linux/man-pages/man5/core.5.html

vbotbuildovich · 2025-02-19T19:55:04Z

CI test results

test results on build#62030

test_id	test_kind	job_url	test_status	passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery	ducktape	https://buildkite.com/redpanda/redpanda/builds/62030#01951f4a-0ff1-45d2-9d14-ccbd458b62db	FLAKY	1/4
rptest.tests.log_compaction_test.LogCompactionTest.compaction_stress_test.cleanup_policy=compact.delete.key_set_cardinality=1000.storage_compaction_key_map_memory_kb=3	ducktape	https://buildkite.com/redpanda/redpanda/builds/62030#01951f4a-0fee-46a6-9ea5-74ecd3c04977	FLAKY	1/2
rptest.tests.log_compaction_test.LogCompactionTest.compaction_stress_test.cleanup_policy=compact.key_set_cardinality=1000.storage_compaction_key_map_memory_kb=10	ducktape	https://buildkite.com/redpanda/redpanda/builds/62030#01951f4a-0fee-46a6-9ea5-74ecd3c04977	FLAKY	1/3
rptest.tests.log_compaction_test.LogCompactionTest.compaction_stress_test.cleanup_policy=compact.key_set_cardinality=1000.storage_compaction_key_map_memory_kb=3	ducktape	https://buildkite.com/redpanda/redpanda/builds/62030#01951f4a-0ff0-4db1-9a89-26d0b5f53b26	FLAKY	1/3
rptest.tests.partition_balancer_test.PartitionBalancerTest.test_unavailable_nodes	ducktape	https://buildkite.com/redpanda/redpanda/builds/62030#01951f4a-0fee-46a6-9ea5-74ecd3c04977	FLAKY	1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic	ducktape	https://buildkite.com/redpanda/redpanda/builds/62030#01951f4a-0ff0-4db1-9a89-26d0b5f53b26	FLAKY	1/2

StephanDollberg · 2025-02-19T20:14:51Z

tests/rptest/tests/crash_loop_checks_test.py

@@ -57,6 +57,9 @@ def __init__(self, test_context):
        )
        self.broker = self.redpanda.nodes[0]

+        # Disable core dumps as they take a long time (>1min). Core dumps are uninteresting for this test, since this test intentionally trigger crashes.
+        self.broker.account.ssh("sysctl -w kernel.core_pattern='|/dev/null'")


Doesn't this have to be reverted at the end of the test?

Note that apport is a giant piece of garbage so there is a point for disabling/removing it in our ansible altogether (at which point ulimit should work again).

Doesn't this have to be reverted at the end of the test?

yeh i am concerned about running this on my local machine

Yeah, that's fair. This PR is meant to be a way of disabling apport, but yeah, good point about this affecting local machines and other tests. Let me reach out to devprod to ask them to disable apport or move to a non-piping core_pattern in CDT.

cool. yeh i'm not opposed to this, but just as a drive-by review it seems like there might be some other ways to approach this like a one-off test outside the normal test harnesses. but not immediately sure the best thing to do

pgellert · 2025-02-20T11:09:58Z

Putting this to draft mode while I discuss with devprod an alternative approach of making a CDT-only change to core_pattern / disabling apport.

pgellert requested review from a team February 19, 2025 16:17

pgellert self-assigned this Feb 19, 2025

pgellert requested review from rpdevmp, oleiman, a team and michael-redpanda and removed request for a team and oleiman February 19, 2025 16:17

StephanDollberg reviewed Feb 19, 2025

View reviewed changes

pgellert marked this pull request as draft February 20, 2025 11:07

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

dt: disable core dumps in crash tracker tests #25117

dt: disable core dumps in crash tracker tests #25117

pgellert commented Feb 19, 2025

vbotbuildovich commented Feb 19, 2025

StephanDollberg Feb 19, 2025 •

edited

Loading

dotnwat Feb 19, 2025

pgellert Feb 20, 2025

dotnwat Feb 20, 2025

pgellert commented Feb 20, 2025

dt: disable core dumps in crash tracker tests #25117

Are you sure you want to change the base?

dt: disable core dumps in crash tracker tests #25117

Conversation

pgellert commented Feb 19, 2025

Backports Required

Release Notes

vbotbuildovich commented Feb 19, 2025

CI test results

StephanDollberg Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

dotnwat Feb 19, 2025

Choose a reason for hiding this comment

pgellert Feb 20, 2025

Choose a reason for hiding this comment

dotnwat Feb 20, 2025

Choose a reason for hiding this comment

pgellert commented Feb 20, 2025

StephanDollberg Feb 19, 2025 •

edited

Loading