Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-8760] rptest: fix race-y checks in test_index_recovery_after_upgrade #25131

Open
wants to merge 2 commits into
base: dev
Choose a base branch
from

Conversation

WillemKauf
Copy link
Contributor

@WillemKauf WillemKauf commented Feb 21, 2025

This test can race with segment rolls and un-self-compacted segments, since self compaction may not occur before redpanda version changes and node restarts occur. Thus, breaking the expectations around the compacted index mtime() stats.

To fix the race conditions, use a newly added compaction index footer reader from compute_storage.py to check if a segment has been self compacted or not, conditionally adding the segment to the list of expected mtime values.

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Feb 21, 2025

CI test results

test results on build#62107
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62107#019529d5-d4b9-4c50-b297-c88be6119185 FLAKY 1/2
test results on build#62115
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62115#01952ab8-c161-472a-bcec-3c67a7e0e6dd FLAKY 1/2
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62115#01952acd-b310-4402-b4ce-6475c54cd11d FLAKY 1/2
rptest.tests.retention_policy_test.ShadowIndexingCloudRetentionTest.test_cloud_time_based_retention.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/62115#01952acd-b30f-4995-9089-dc1af7203903 FLAKY 1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/62115#01952ab8-c163-4118-b725-5008ec3e1087 FLAKY 1/2
storage_e2e_single_thread_rpunit.storage_e2e_single_thread_rpunit unit https://buildkite.com/redpanda/redpanda/builds/62115#01952a71-6eda-462c-8080-0b4a053566d4 FLAKY 1/2
test results on build#62124
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62124#01952bb0-c90a-468d-b827-13559641f932 FLAKY 50/51
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62124#01952bb0-c90a-4bf7-8a90-f52e84cbab5b FLAKY 50/53
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62124#01952bb4-2fe3-4789-8f94-86b3428de412 FLAKY 50/54
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62124#01952bb4-2fe3-4e3d-9c76-92cd013c08d5 FLAKY 50/52
test results on build#62138
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62138#01952fa4-85a7-40d6-b5fb-aa0026a7ab84 FAIL 0/20
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62138#01952fb7-ad7b-4772-89d6-6826dc95188c FAIL 0/20
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/62138#01952fa4-85a9-484d-9c2c-85842acf3595 FLAKY 1/2
test results on build#62144
test_id test_kind job_url test_status passed
gtest_raft_rpunit.gtest_raft_rpunit unit https://buildkite.com/redpanda/redpanda/builds/62144#01953057-ffec-4306-97b7-31a722671e43 FLAKY 1/2
rptest.tests.archival_test.ArchivalTest.test_all_partitions_leadership_transfer.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/62144#019530b4-7567-40fb-a2a4-16a3401fa044 FLAKY 1/2
rptest.tests.datalake.datalake_e2e_test.DatalakeE2ETests.test_topic_lifecycle.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_JDBC ducktape https://buildkite.com/redpanda/redpanda/builds/62144#019530b4-7567-40fb-a2a4-16a3401fa044 FLAKY 1/2
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=False.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/62144#0195309f-e020-49c8-ae9c-99f0343ba655 FLAKY 1/2
rptest.tests.random_node_operations_test.RandomNodeOperationsTest.test_node_operations.enable_failures=True.mixed_versions=False.with_tiered_storage=True.with_iceberg=False.with_chunked_compaction=True.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/62144#0195309f-e020-4967-bac8-fdaf57e82fe9 FLAKY 1/2
storage_e2e_single_thread_rpunit.storage_e2e_single_thread_rpunit unit https://buildkite.com/redpanda/redpanda/builds/62144#01953057-ffec-4306-97b7-31a722671e43 FLAKY 1/2
test results on build#62152
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62152#01953181-e34d-4210-823a-cf270e1202d5 FLAKY 50/55
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62152#01953181-e34d-4c1c-accd-ca1ee5e3e9a5 FLAKY 50/55
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62152#0195318f-9dd4-4c96-8c6b-bbaf27608048 FLAKY 50/55
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62152#0195318f-9dd5-4336-88e2-4328e2f7d769 FLAKY 50/54
test results on build#62157
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62157#01953337-a5db-43cb-ad95-12f595b93803 FLAKY 1/2
rptest.tests.datalake.custom_partitioning_test.DatalakeCustomPartitioningTest.test_basic.cloud_storage_type=CloudStorageType.S3.catalog_type=CatalogType.REST_JDBC ducktape https://buildkite.com/redpanda/redpanda/builds/62157#01953324-87c9-42ac-b748-168c8b293a6e FLAKY 1/2
rptest.tests.scaling_up_test.ScalingUpTest.test_scaling_up_with_recovered_topic ducktape https://buildkite.com/redpanda/redpanda/builds/62157#01953337-a5dc-4615-88a1-ed74e9e2320a FLAKY 1/2
test results on build#62167
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62167#0195353f-d8b0-406e-877c-9dd7d2b9a102 FLAKY 20/24
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62167#0195353f-d8b1-46ce-9dcf-1ceab654743f FLAKY 20/22
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62167#0195353f-d8b1-479d-b485-226d96e77409 FLAKY 20/23
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62167#0195353f-d8b1-48a9-9d08-5007b25f58f2 FLAKY 20/23
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62167#0195353f-d8b1-49c7-bf39-fc9963f61ad7 FLAKY 20/24
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62167#0195353f-d8b7-441c-9791-19cf3e28ec10 FLAKY 20/23
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62167#0195353f-d8b7-4e3d-ba15-da3bb5c5d191 FLAKY 20/23
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62167#0195353f-d8b7-4ecf-bce1-e5837856cecf FLAKY 20/28
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62167#0195353f-d8b8-4391-97e4-69aaad76d7c1 FLAKY 20/24
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62167#0195353f-d8b8-4e99-b587-f347c3f4cdb3 FLAKY 20/22

@WillemKauf
Copy link
Contributor Author

Still flakey. Hmm.

@WillemKauf WillemKauf force-pushed the compaction_recovery_upgrade_test_fix branch 2 times, most recently from ef46591 to 854b56a Compare February 21, 2025 21:35
@WillemKauf
Copy link
Contributor Author

Force push to:

  • Add new condition to finished_compaction() to ensure all segments in partition data used to evaluate mtime() equality/inequality have finished self-compaction in the version epoch in which they were produced.

@WillemKauf
Copy link
Contributor Author

/ci-repeat 2
skip-build
skip-units
dt-repeat=50
tests/rptest/tests/compaction_recovery_test.py

@WillemKauf WillemKauf enabled auto-merge February 22, 2025 05:14
@WillemKauf WillemKauf disabled auto-merge February 22, 2025 17:20
@WillemKauf WillemKauf force-pushed the compaction_recovery_upgrade_test_fix branch from 854b56a to 3874470 Compare February 22, 2025 20:31
@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Feb 22, 2025

Retry command for Build#62138

please wait until all jobs are finished before running the slash command


/ci-repeat 1
tests/rptest/tests/compaction_recovery_test.py::CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade

@WillemKauf WillemKauf force-pushed the compaction_recovery_upgrade_test_fix branch from 3874470 to 969ef98 Compare February 23, 2025 01:06
@WillemKauf
Copy link
Contributor Author

/ci-repeat 2
skip-build
skip-units
dt-repeat=50
tests/rptest/tests/compaction_recovery_test.py

@WillemKauf WillemKauf force-pushed the compaction_recovery_upgrade_test_fix branch from 969ef98 to a733f2f Compare February 23, 2025 12:44
Copy link
Member

@dotnwat dotnwat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self compaction may not occur before redpanda version changes and node restarts occur.

did you consider changing the test to delay the cluster changes until the desired pre-change state has been reached?

@WillemKauf WillemKauf force-pushed the compaction_recovery_upgrade_test_fix branch 2 times, most recently from cfe1c1f to 6665122 Compare February 23, 2025 20:11
@WillemKauf
Copy link
Contributor Author

/ci-repeat 5
skip-redpanda-build
skip-units
dt-repeat=20
tests/rptest/tests/compaction_recovery_test.py

@WillemKauf WillemKauf force-pushed the compaction_recovery_upgrade_test_fix branch from 6665122 to decc991 Compare February 24, 2025 05:09
Parses the compaction footer from a compacted index for a segment.
This function implements logic similar to that found in
`compacted_index_chunk_reader::load_footer()` to properly read
compacted index footers for either V1 or V2/V3.
This test can race with segment rolls and un-self-compacted segments,
since self compaction may not occur before `redpanda` version changes
and node restarts occur. Thus, breaking the expectations around the
compacted index `mtime()` stats.

To fix the race conditions, use the newly added compaction index footer
reader from `compute_storage.py` to check if a segment has been self compacted
or not.
@WillemKauf WillemKauf force-pushed the compaction_recovery_upgrade_test_fix branch from decc991 to 376fb6d Compare February 24, 2025 05:10
@WillemKauf
Copy link
Contributor Author

Force push to:

  • Add a compaction index footer parser function, read_compaction_footer(), to remote_scripts/compute_storage.py
  • Use the data returned by this function to determine whether a segment has been self compacted or not in compaction_recovery_test.py, to properly account for expectations of the test.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants