Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-8669] storage: early abort compaction #25085

Draft
wants to merge 7 commits into
base: dev
Choose a base branch
from

Conversation

WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Feb 13, 2025

Converting to draft, no need to review. We will work on separating garbage collection and compaction in separate loops first, and then worry about the ideas here.

Compaction can be a long running process, during which prefix truncation and regularly scheduled garbage collection is blocked.

This PR, most notably, adds compaction_config::maybe_abort_compaction() which checks the abort source (as was done before) but also now considers two signals for terminating the compaction process early:

  1. Degraded or low disk space alerts
  2. Resource management triggered garbage collection

In either of these cases, compaction is aborted early.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Improvements

  • Allow the early abortion of compaction in the cases of disk pressure or resource management triggered garbage collection in order to prevent long running compactions from blocking emergency retention measures

This exception is intended to be thrown in the case of early
compaction abortion.
Signals that garbage collection is required ahead of or in the middle
of compaction. This returns `true` in low or degraded disk space scenarios,
as well as regularly triggered garbage collection.
Also add member variables for `disk_space_alert` and `disk_log_impl`
for use in determining whether compaction should be aborted.
@WillemKauf WillemKauf changed the title storage: early abort compaction [CORE-8669] storage: early abort compaction Feb 13, 2025
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Feb 14, 2025

CI test results

test results on build#61850
test_id test_kind job_url test_status passed
kafka_server_rpfixture.kafka_server_rpfixture unit https://buildkite.com/redpanda/redpanda/builds/61850#01950115-5ef3-4b31-bb8d-1fe7282fddef FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61850#0195015e-543e-4801-a743-7fc4e6818184 FLAKY 1/2
rptest.tests.log_compaction_test.LogCompactionTest.compaction_stress_test.cleanup_policy=compact.delete.key_set_cardinality=1000.storage_compaction_key_map_memory_kb=10 ducktape https://buildkite.com/redpanda/redpanda/builds/61850#0195015e-543e-4801-a743-7fc4e6818184 FLAKY 1/2
rptest.tests.partition_force_reconfiguration_test.NodeWiseRecoveryTest.test_node_wise_recovery.dead_node_count=1 ducktape https://buildkite.com/redpanda/redpanda/builds/61850#0195015a-2370-45b5-a920-0a174cea0d6b FLAKY 1/2
rptest.tests.partition_move_interruption_test.PartitionMoveInterruption.test_forced_cancellation ducktape https://buildkite.com/redpanda/redpanda/builds/61850#0195015a-2370-4e70-9459-a943a5ae100a FLAKY 1/2
rptest.tests.write_caching_fi_e2e_test.WriteCachingFailureInjectionE2ETest.test_crash_all_with_consumer_group ducktape https://buildkite.com/redpanda/redpanda/builds/61850#0195015a-2370-45b5-a920-0a174cea0d6b FLAKY 1/2
test results on build#61886
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/61886#01950820-a6a9-4b1b-9ae7-8dd9907650c7 FLAKY 1/5
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/61886#01950820-a6aa-4dc9-9294-b655c5cbbe23 FLAKY 1/2
rptest.tests.log_compaction_test.LogCompactionTest.compaction_stress_test.cleanup_policy=compact.delete.key_set_cardinality=1000.storage_compaction_key_map_memory_kb=3 ducktape https://buildkite.com/redpanda/redpanda/builds/61886#0195083b-4038-4ee7-88b5-088f069e1136 FLAKY 1/2
rptest.tests.log_compaction_test.LogCompactionTest.compaction_stress_test.cleanup_policy=compact.key_set_cardinality=1000.storage_compaction_key_map_memory_kb=3 ducktape https://buildkite.com/redpanda/redpanda/builds/61886#0195083b-4039-4f96-add1-0040eb874393 FLAKY 1/2

Copy link
Contributor

@Lazin Lazin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how does it interact with space management?

@WillemKauf
Copy link
Contributor Author

WillemKauf commented Feb 14, 2025

how does it interact with space management?

Thanks for asking.

For a tiered-storage (archival) enabled topic, in disk_space_manager::manage_data_disk we use the current disk usage and retention configurations to install an eviction schedule, the severity of which is dependent on how much we are able to reclaim at each level (i.e up to the configured local retention limits, low space level, active segment).

This ultimately has the effect of setting cloud_gc_offset in the log to an offset per the eviction schedule, to be used and reset later in disk_log_impl::do_gc() as an override for the eviction offset (which then feeds into the log_eviction_stm to perform the prefix truncation up to that point).

With this PR, we now (repeatedly) check if the log being compacted has a cloud_gc_offset throughout the compaction process, and will abort compaction early if it does (as this is an indicator that resource management has requested eviction in order to free up disk space, and we should bail out of compaction to allow the prefix truncation to occur).

Let me know if this clears it up.

All of the current checks to `maybe_abort_compaction()` happen after
some work has already been done in the compaction process.

Add an extra check near the start of compaction to early return
as soon as possible in the presence of disk pressure or space
management.
@WillemKauf
Copy link
Contributor Author

Push to:

  • Add an extra check near the start of compaction to early return as soon as possible in the presence of disk pressure or space management.

Comment on lines +270 to +276
// If resource management has set a cloud_gc_offset, bail out of
// compaction early to allow prefix truncation to quickly reclaim data.
if (log.has_cloud_gc_offset()) {
throw gc_required_exception(
fmt::format("Bailing out of compaction due to resource "
"management setting cloud_gc_offset"));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm isn't the cloud GC offset almost always set when there's a stream of incoming data in the steady state? I thought typically space management lets local storage grow, but at around 70-80% it tries to reclaim from all partitions every 30 seconds?

Or is the idea that compaction always takes under 30 seconds, so there's no risk of starving compactions?

Copy link
Contributor Author

@WillemKauf WillemKauf Feb 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm isn't the cloud GC offset almost always set when there's a stream of incoming data in the steady state? I thought typically space management lets local storage grow, but at around 70-80% it tries to reclaim from all partitions every 30 seconds?

@dotnwat and I were discussing this potential, highly undesirable state, in which compaction is constantly being bailed out of because the cloud GC offset is set, but space management is not reclaiming data for whatever reason/or not reclaiming enough data to get us out of the retention limit. All of this leading to 0 retention progress being made.

I'm not sure there's a great solution here yet, I will have to think about it more.

@WillemKauf
Copy link
Contributor Author

As a note here: the storage team has been brainstorming ways to run garbage collection and compaction concurrently in separate fibres, which would render this work mostly unnecessary.

@WillemKauf WillemKauf marked this pull request as draft February 20, 2025 21:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants