Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: concurrent housekeeping #25134

Open
wants to merge 5 commits into
base: dev
Choose a base branch
from

Conversation

WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Feb 21, 2025

WIP dont look at this before #24991

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

Improvements

  • Add an extra lock to disk_log_impl

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Feb 22, 2025

Retry command for Build#62113

please wait until all jobs are finished before running the slash command



/ci-repeat 1
tests/rptest/tests/shadow_indexing_compacted_topic_test.py::TSWithAlreadyCompactedTopic.test_initial_upload
tests/rptest/tests/full_disk_test.py::FullDiskReclaimTest.test_full_disk_triggers_gc
tests/rptest/tests/shadow_indexing_compacted_topic_test.py::ShadowIndexingCompactedTopicTest.test_upload@{"cloud_storage_type":1}
tests/rptest/tests/shadow_indexing_compacted_topic_test.py::ShadowIndexingCompactedTopicTest.test_upload@{"cloud_storage_type":2}

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Feb 22, 2025

CI test results

test results on build#62113
test_id test_kind job_url test_status passed
cloud_storage_rpfixture.cloud_storage_rpfixture unit https://buildkite.com/redpanda/redpanda/builds/62113#01952a68-d8da-4087-8512-1998e4248139 FAIL 0/2
cloud_storage_rpfixture.cloud_storage_rpfixture unit https://buildkite.com/redpanda/redpanda/builds/62113#01952a68-d8db-43c3-bd1f-96bd9714f70f FAIL 0/2
kafka_server_rpfixture.kafka_server_rpfixture unit https://buildkite.com/redpanda/redpanda/builds/62113#01952a68-d8db-43c3-bd1f-96bd9714f70f FLAKY 1/2
rptest.tests.datalake.mount_unmount_test.MountUnmountIcebergTest.test_simple_unmount.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/62113#01952ac4-bd51-43aa-a6b1-5e136fe9afe1 FLAKY 1/2
rptest.tests.full_disk_test.FullDiskReclaimTest.test_full_disk_triggers_gc ducktape https://buildkite.com/redpanda/redpanda/builds/62113#01952ac4-bd50-4e2e-a708-906d162c1889 FAIL 0/20
rptest.tests.scaling_up_test.ScalingUpTest.test_adding_nodes_to_cluster.partition_count=20 ducktape https://buildkite.com/redpanda/redpanda/builds/62113#01952ac4-bd50-4e4e-b2df-0c88a91695d9 FLAKY 1/2
rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/62113#01952ac4-bd51-43aa-a6b1-5e136fe9afe1 FAIL 0/20
rptest.tests.shadow_indexing_compacted_topic_test.ShadowIndexingCompactedTopicTest.test_upload.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/62113#01952ac4-bd4f-4ecf-a1d3-8abe7b4bb3c2 FAIL 0/20
rptest.tests.shadow_indexing_compacted_topic_test.TSWithAlreadyCompactedTopic.test_initial_upload ducktape https://buildkite.com/redpanda/redpanda/builds/62113#01952ac4-bd50-4e2e-a708-906d162c1889 FAIL 0/20
rptest.tests.simple_e2e_test.SimpleEndToEndTest.test_consumer_interruption ducktape https://buildkite.com/redpanda/redpanda/builds/62113#01952ac4-bd50-4e2e-a708-906d162c1889 FLAKY 1/2
test results on build#62145
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62145#019530a1-2cf2-4df4-bf60-565c6991c9a1 FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62145#019530b5-0042-4df9-af9d-5cc1cf724d70 FLAKY 1/2
rptest.transactions.producers_api_test.ProducersAdminAPITest.test_producers_state_api_during_load ducktape https://buildkite.com/redpanda/redpanda/builds/62145#019530b5-0043-4bf7-aa1f-578aa6c836c9 FLAKY 1/2
rptest.transactions.stream_verifier_test.StreamVerifierTest.test_simple_produce_consume_txn_with_add_node ducktape https://buildkite.com/redpanda/redpanda/builds/62145#019530b5-0044-467f-a34e-873267346567 FLAKY 1/2

@WillemKauf WillemKauf force-pushed the concurrent_housekeeping branch from b60242c to e1d4d8b Compare February 22, 2025 23:55
This lock controls concurrency between `gc()` and `housekeeping()`,
which are two functions that in the past have not been concurrent.

We are locking to invoke these concurrently in two separate fibres
from the `log_manager`. It is expected that `gc()` is a fast process,
while `housekeeping()` (which performs compaction), is not.
We are going to separate garbage collection from housekeeping
(garbage collection then compaction) into seperate loops within
the `log_manager`.

Add `housekeeping_job_t` to specify which of the two jobs should be
run by a generic worker function in future commits.
To be specific about the future uses of the semaphore and jitter variables,
rename them to `_housekeeping_jitter`. Also, add a new `_gc_sem`.
Using the `housekeeping_job_t`, we now kick off two background
fibres to handle urgent garbage collection and housekeeping.

The underlying functions called are `log->gc()` and `log->housekeeping()`,
which as mentioned, have their concurrency managed by
`disk_log_impl::housekeeping_lock`.
To improve observability for long running compaction processes
which may be starving out urgent garbage collection.
@WillemKauf WillemKauf force-pushed the concurrent_housekeeping branch from e1d4d8b to 4d4c4a6 Compare February 23, 2025 01:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants