-
Notifications
You must be signed in to change notification settings - Fork 604
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CORE-8669] storage
: early abort compaction
#25085
base: dev
Are you sure you want to change the base?
Conversation
This exception is intended to be thrown in the case of early compaction abortion.
Signals that garbage collection is required ahead of or in the middle of compaction. This returns `true` in low or degraded disk space scenarios, as well as regularly triggered garbage collection.
Also add member variables for `disk_space_alert` and `disk_log_impl` for use in determining whether compaction should be aborted.
storage
: early abort compactionstorage
: early abort compaction
CI test resultstest results on build#61850
test results on build#61886
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how does it interact with space management?
Thanks for asking. For a tiered-storage (archival) enabled topic, in This ultimately has the effect of setting With this PR, we now (repeatedly) check if the log being compacted has a Let me know if this clears it up. |
All of the current checks to `maybe_abort_compaction()` happen after some work has already been done in the compaction process. Add an extra check near the start of compaction to early return as soon as possible in the presence of disk pressure or space management.
Push to:
|
// If resource management has set a cloud_gc_offset, bail out of | ||
// compaction early to allow prefix truncation to quickly reclaim data. | ||
if (log.has_cloud_gc_offset()) { | ||
throw gc_required_exception( | ||
fmt::format("Bailing out of compaction due to resource " | ||
"management setting cloud_gc_offset")); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm isn't the cloud GC offset almost always set when there's a stream of incoming data in the steady state? I thought typically space management lets local storage grow, but at around 70-80% it tries to reclaim from all partitions every 30 seconds?
Or is the idea that compaction always takes under 30 seconds, so there's no risk of starving compactions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm isn't the cloud GC offset almost always set when there's a stream of incoming data in the steady state? I thought typically space management lets local storage grow, but at around 70-80% it tries to reclaim from all partitions every 30 seconds?
@dotnwat and I were discussing this potential, highly undesirable state, in which compaction is constantly being bailed out of because the cloud GC offset is set, but space management is not reclaiming data for whatever reason/or not reclaiming enough data to get us out of the retention limit. All of this leading to 0 retention progress being made.
I'm not sure there's a great solution here yet, I will have to think about it more.
As a note here: the storage team has been brainstorming ways to run garbage collection and compaction concurrently in separate fibres, which would render this work mostly unnecessary. |
Converting to draft, no need to review. We will work on separating garbage collection and compaction in separate loops first, and then worry about the ideas here.
Compaction can be a long running process, during which prefix truncation and regularly scheduled garbage collection is blocked.
This PR, most notably, adds
compaction_config::maybe_abort_compaction()
which checks the abort source (as was done before) but also now considers two signals for terminating the compaction process early:In either of these cases, compaction is aborted early.
Backports Required
Release Notes
Improvements