storage_min_free_bytes config option for topic-level #25074

gilad-aperio · 2025-02-11T08:39:07Z

Who is this for and what problem do they have today?

There are cases where some topics should have minimal message latency between production and consumption, while others allow higher latency.

For example, live-streaming vs download. Let's call them topic S and topic D.

When these topics share a single broker and disk, and disk pressure starts to build up due many messages in topic D, producers to all topics are rejected based on the value of storage_min_free_bytes. Meaning, topic D being full affects performance of topic S.

My request is to have a topic-level configuration option that rejects producers to that topic based on minimum free bytes in disk (the value of which will be higher than storage_min_free_bytes).

What are the success criteria?

Have production to other topics unaffected by a high disk pressure threshold that was configured to a specific topic.

Why is solving this problem impactful?

Enables hosting low-latency topics on brokers that also handle high-production topics.

Additional notes

JIRA Link: CORE-9066

The text was updated successfully, but these errors were encountered:

dotnwat · 2025-02-20T05:02:15Z

cc @mattschumpert

mattschumpert · 2025-02-20T18:43:28Z

Redpanda is certainly designed to handle mixed workloads (high/low latency) on the same broker with many configuration knobs to effect the relative prioritization, but not with respect to use of disk space in on-premise environments without any access to cloud storage, as there isn't a good way to do this.

In systems with a single storage tier (local storage only) the disk is a shared resource, and when it is globally full we need to protect the broker from disk fullness regardless of who is writing messages and to which topic, without an obvious way to make tradeoffs

Its not a question of latency/performance but keeping the system available overall and since we can't force-delete a users' topic data, there isn't really room for prioritization of disk space without a force-delete violating (retention.ms of a low priority topic) as I see it.

In cloud environments using Tiered Storage, we do have such a mechanism for prioritizing low latency topics wrt their disk resources, and auto-magically managing disk space accordingly ('Space Management'). This is accomplished by tuning the 'local retention target' and by offloading data to cloud storage automatically when under disk pressure, taking into account these hints on topics that desire more local disk retention in order to support low end to end latency further back in the log.

These tools are described here: https://docs.redpanda.com/current/manage/cluster-maintenance/disk-utilization/#space-management

gilad-aperio · 2025-02-23T08:24:27Z

I understand the need to protect system availability. This is not a request to remove storage_min_free_bytes for certain topics, but rather be extra conservative by limiting production to the noisy ones before even reaching that safeguard. As I see it, this actually improves availability while maintaining simplicity.

gilad-aperio added the kind/enhance New feature or request label Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage_min_free_bytes config option for topic-level #25074

storage_min_free_bytes config option for topic-level #25074

gilad-aperio commented Feb 11, 2025 •

edited by github-actions bot

Loading

dotnwat commented Feb 20, 2025

mattschumpert commented Feb 20, 2025

gilad-aperio commented Feb 23, 2025

storage_min_free_bytes config option for topic-level #25074

storage_min_free_bytes config option for topic-level #25074

Comments

gilad-aperio commented Feb 11, 2025 • edited by github-actions bot Loading

Who is this for and what problem do they have today?

What are the success criteria?

Why is solving this problem impactful?

Additional notes

dotnwat commented Feb 20, 2025

mattschumpert commented Feb 20, 2025

gilad-aperio commented Feb 23, 2025

gilad-aperio commented Feb 11, 2025 •

edited by github-actions bot

Loading