[CORE-8669] `storage`: early abort compaction #25085

WillemKauf · 2025-02-13T20:48:54Z

Converting to draft, no need to review. We will work on separating garbage collection and compaction in separate loops first, and then worry about the ideas here.

Compaction can be a long running process, during which prefix truncation and regularly scheduled garbage collection is blocked.

This PR, most notably, adds compaction_config::maybe_abort_compaction() which checks the abort source (as was done before) but also now considers two signals for terminating the compaction process early:

Degraded or low disk space alerts
Resource management triggered garbage collection

In either of these cases, compaction is aborted early.

Backports Required

Release Notes

Improvements

Allow the early abortion of compaction in the cases of disk pressure or resource management triggered garbage collection in order to prevent long running compactions from blocking emergency retention measures

This exception is intended to be thrown in the case of early compaction abortion.

Signals that garbage collection is required ahead of or in the middle of compaction. This returns `true` in low or degraded disk space scenarios, as well as regularly triggered garbage collection.

Also add member variables for `disk_space_alert` and `disk_log_impl` for use in determining whether compaction should be aborted.

vbotbuildovich · 2025-02-14T00:35:48Z

CI test results

test results on build#61850

test_id	test_kind	job_url	test_status	passed
kafka_server_rpfixture.kafka_server_rpfixture	unit	https://buildkite.com/redpanda/redpanda/builds/61850#01950115-5ef3-4b31-bb8d-1fe7282fddef	FLAKY	1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery	ducktape	https://buildkite.com/redpanda/redpanda/builds/61850#0195015e-543e-4801-a743-7fc4e6818184	FLAKY	1/2
rptest.tests.log_compaction_test.LogCompactionTest.compaction_stress_test.cleanup_policy=compact.delete.key_set_cardinality=1000.storage_compaction_key_map_memory_kb=10	ducktape	https://buildkite.com/redpanda/redpanda/builds/61850#0195015e-543e-4801-a743-7fc4e6818184	FLAKY	1/2
rptest.tests.partition_force_reconfiguration_test.NodeWiseRecoveryTest.test_node_wise_recovery.dead_node_count=1	ducktape	https://buildkite.com/redpanda/redpanda/builds/61850#0195015a-2370-45b5-a920-0a174cea0d6b	FLAKY	1/2
rptest.tests.partition_move_interruption_test.PartitionMoveInterruption.test_forced_cancellation	ducktape	https://buildkite.com/redpanda/redpanda/builds/61850#0195015a-2370-4e70-9459-a943a5ae100a	FLAKY	1/2
rptest.tests.write_caching_fi_e2e_test.WriteCachingFailureInjectionE2ETest.test_crash_all_with_consumer_group	ducktape	https://buildkite.com/redpanda/redpanda/builds/61850#0195015a-2370-45b5-a920-0a174cea0d6b	FLAKY	1/2

test results on build#61886

test_id	test_kind	job_url	test_status	passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery	ducktape	https://buildkite.com/redpanda/redpanda/builds/61886#01950820-a6a9-4b1b-9ae7-8dd9907650c7	FLAKY	1/5
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade	ducktape	https://buildkite.com/redpanda/redpanda/builds/61886#01950820-a6aa-4dc9-9294-b655c5cbbe23	FLAKY	1/2
rptest.tests.log_compaction_test.LogCompactionTest.compaction_stress_test.cleanup_policy=compact.delete.key_set_cardinality=1000.storage_compaction_key_map_memory_kb=3	ducktape	https://buildkite.com/redpanda/redpanda/builds/61886#0195083b-4038-4ee7-88b5-088f069e1136	FLAKY	1/2
rptest.tests.log_compaction_test.LogCompactionTest.compaction_stress_test.cleanup_policy=compact.key_set_cardinality=1000.storage_compaction_key_map_memory_kb=3	ducktape	https://buildkite.com/redpanda/redpanda/builds/61886#0195083b-4039-4f96-add1-0040eb874393	FLAKY	1/2

Lazin

how does it interact with space management?

WillemKauf · 2025-02-14T23:45:30Z

how does it interact with space management?

Thanks for asking.

For a tiered-storage (archival) enabled topic, in disk_space_manager::manage_data_disk we use the current disk usage and retention configurations to install an eviction schedule, the severity of which is dependent on how much we are able to reclaim at each level (i.e up to the configured local retention limits, low space level, active segment).

This ultimately has the effect of setting cloud_gc_offset in the log to an offset per the eviction schedule, to be used and reset later in disk_log_impl::do_gc() as an override for the eviction offset (which then feeds into the log_eviction_stm to perform the prefix truncation up to that point).

With this PR, we now (repeatedly) check if the log being compacted has a cloud_gc_offset throughout the compaction process, and will abort compaction early if it does (as this is an indicator that resource management has requested eviction in order to free up disk space, and we should bail out of compaction to allow the prefix truncation to occur).

Let me know if this clears it up.

All of the current checks to `maybe_abort_compaction()` happen after some work has already been done in the compaction process. Add an extra check near the start of compaction to early return as soon as possible in the presence of disk pressure or space management.

WillemKauf · 2025-02-15T04:27:23Z

Push to:

Add an extra check near the start of compaction to early return as soon as possible in the presence of disk pressure or space management.

andrwng · 2025-02-19T01:59:17Z

src/v/storage/types.cc

+        // If resource management has set a cloud_gc_offset, bail out of
+        // compaction early to allow prefix truncation to quickly reclaim data.
+        if (log.has_cloud_gc_offset()) {
+            throw gc_required_exception(
+              fmt::format("Bailing out of compaction due to resource "
+                          "management setting cloud_gc_offset"));
+        }


Hmm isn't the cloud GC offset almost always set when there's a stream of incoming data in the steady state? I thought typically space management lets local storage grow, but at around 70-80% it tries to reclaim from all partitions every 30 seconds?

Or is the idea that compaction always takes under 30 seconds, so there's no risk of starving compactions?

Hmm isn't the cloud GC offset almost always set when there's a stream of incoming data in the steady state? I thought typically space management lets local storage grow, but at around 70-80% it tries to reclaim from all partitions every 30 seconds?

@dotnwat and I were discussing this potential, highly undesirable state, in which compaction is constantly being bailed out of because the cloud GC offset is set, but space management is not reclaiming data for whatever reason/or not reclaiming enough data to get us out of the retention limit. All of this leading to 0 retention progress being made.

I'm not sure there's a great solution here yet, I will have to think about it more.

WillemKauf · 2025-02-19T21:10:26Z

As a note here: the storage team has been brainstorming ways to run garbage collection and compaction concurrently in separate fibres, which would render this work mostly unnecessary.

WillemKauf added 6 commits February 13, 2025 15:31

storage: add gc_required_exception

0928859

This exception is intended to be thrown in the case of early compaction abortion.

storage: add disk_log_impl::has_cloud_gc_offset()

0a17885

storage: add log_manager::gc_required()

abf38ca

Signals that garbage collection is required ahead of or in the middle of compaction. This returns `true` in low or degraded disk space scenarios, as well as regularly triggered garbage collection.

storage: add maybe_abort_compaction() to compaction_config

b2020cb

Also add member variables for `disk_space_alert` and `disk_log_impl` for use in determining whether compaction should be aborted.

storage: plumb maybe_abort_compaction() into compaction utils

8676125

storage: add early abort compaction tests

e46a9d0

WillemKauf requested review from dotnwat and andrwng February 13, 2025 20:48

github-actions bot added the area/redpanda label Feb 13, 2025

WillemKauf changed the title ~~storage: early abort compaction~~ [CORE-8669] storage: early abort compaction Feb 13, 2025

Lazin reviewed Feb 14, 2025

View reviewed changes

andrwng reviewed Feb 19, 2025

View reviewed changes

WillemKauf marked this pull request as draft February 20, 2025 21:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CORE-8669] `storage`: early abort compaction #25085

[CORE-8669] `storage`: early abort compaction #25085

WillemKauf commented Feb 13, 2025 •

edited

Loading

vbotbuildovich commented Feb 14, 2025 •

edited

Loading

Lazin left a comment

WillemKauf commented Feb 14, 2025 •

edited

Loading

WillemKauf commented Feb 15, 2025

andrwng Feb 19, 2025

WillemKauf Feb 19, 2025 •

edited

Loading

WillemKauf commented Feb 19, 2025

[CORE-8669] storage: early abort compaction #25085

Are you sure you want to change the base?

[CORE-8669] storage: early abort compaction #25085

Conversation

WillemKauf commented Feb 13, 2025 • edited Loading

Converting to draft, no need to review. We will work on separating garbage collection and compaction in separate loops first, and then worry about the ideas here.

Backports Required

Release Notes

Improvements

vbotbuildovich commented Feb 14, 2025 • edited Loading

CI test results

Lazin left a comment

Choose a reason for hiding this comment

WillemKauf commented Feb 14, 2025 • edited Loading

WillemKauf commented Feb 15, 2025

andrwng Feb 19, 2025

Choose a reason for hiding this comment

WillemKauf Feb 19, 2025 • edited Loading

Choose a reason for hiding this comment

WillemKauf commented Feb 19, 2025

[CORE-8669] `storage`: early abort compaction #25085

[CORE-8669] `storage`: early abort compaction #25085

WillemKauf commented Feb 13, 2025 •

edited

Loading

vbotbuildovich commented Feb 14, 2025 •

edited

Loading

WillemKauf commented Feb 14, 2025 •

edited

Loading

WillemKauf Feb 19, 2025 •

edited

Loading