Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[CORE-8933] Setup for translation scheduling port #25126

Open
wants to merge 10 commits into
base: dev
Choose a base branch
from

Conversation

oleiman
Copy link
Member

@oleiman oleiman commented Feb 21, 2025

Datalake API changes setting us up for porting the translation path to the new scheduler infrastructure. As written, should be functionally equivalent to current world, but some APIs have changes shape significantly.

Pulled form #25077

Backports Required

  • none - not a bug fix
  • none - this is a backport
  • none - issue does not exist in previous branches
  • none - papercut/not impactful enough to backport
  • v24.3.x
  • v24.2.x
  • v24.1.x

Release Notes

  • none

This method loops through the column writers to check if any of them are
flush worthy, computes the memory usage in the same loop. Useful for a
latter commit that avoids this loop again and needs stats right after
append.

Signed-off-by: Oren Leiman <[email protected]>
The interface implementations keep track of the current memory used by
the writer and related reservations.

Signed-off-by: Oren Leiman <[email protected]>
Adds the following

- flush() - flushes all the buffered bytes to the output stream
- methods to fetch buffered/flushed bytes

Signed-off-by: Oren Leiman <[email protected]>
@oleiman oleiman force-pushed the dlib/core-8933/translator-scheduling-port-foundation branch from b929834 to 2098d37 Compare February 21, 2025 00:11
@oleiman oleiman marked this pull request as ready for review February 21, 2025 00:26
.. instead of lazy_abort_source. To be used later, they are both
connected anyway.

Signed-off-by: Oren Leiman <[email protected]>
@oleiman oleiman force-pushed the dlib/core-8933/translator-scheduling-port-foundation branch from 2098d37 to f2b3275 Compare February 21, 2025 00:51
@oleiman
Copy link
Member Author

oleiman commented Feb 21, 2025

/ci-repeat 1

@vbotbuildovich
Copy link
Collaborator

vbotbuildovich commented Feb 21, 2025

CI test results

test results on build#62085
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryTest.test_index_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/62085#01952647-e283-4c18-906c-3c6f633be00a FLAKY 1/3
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62085#01952647-e280-4288-a8da-1db982070470 FLAKY 1/2
rptest.tests.consumer_group_recovery_test.ConsumerOffsetsRecoveryTest.test_consumer_offsets_partition_recovery ducktape https://buildkite.com/redpanda/redpanda/builds/62085#01952652-52c1-4643-b33a-6af7c6e0b332 FLAKY 1/2
rptest.tests.datalake.mount_unmount_test.MountUnmountIcebergTest.test_simple_unmount.cloud_storage_type=CloudStorageType.S3 ducktape https://buildkite.com/redpanda/redpanda/builds/62085#01952647-e282-4087-b4b4-d9a1880002a6 FLAKY 1/2
rptest.tests.e2e_shadow_indexing_test.ShadowIndexingWhileBusyTest.test_create_or_delete_topics_while_busy.short_retention=True.cloud_storage_type=CloudStorageType.ABS ducktape https://buildkite.com/redpanda/redpanda/builds/62085#01952647-e283-4c18-906c-3c6f633be00a FLAKY 1/2
rptest.transactions.consumer_offsets_test.VerifyConsumerOffsetsThruUpgrades.test_consumer_group_offsets.versions_to_upgrade=1 ducktape https://buildkite.com/redpanda/redpanda/builds/62085#01952652-52c2-4cc3-8e9e-0a766971b2f6 FLAKY 1/2
test results on build#62166
test_id test_kind job_url test_status passed
rptest.tests.compaction_recovery_test.CompactionRecoveryUpgradeTest.test_index_recovery_after_upgrade ducktape https://buildkite.com/redpanda/redpanda/builds/62166#0195352a-f192-4d4e-a367-861d49d7f92e FLAKY 1/2
rptest.tests.data_migrations_api_test.DataMigrationsApiTest.test_migrated_topic_data_integrity.transfer_leadership=True.params=.cancellation.None.use_alias.False ducktape https://buildkite.com/redpanda/redpanda/builds/62166#0195353f-12fa-4368-bee5-352243506ae8 FLAKY 1/2

@oleiman
Copy link
Member Author

oleiman commented Feb 21, 2025

CI Failure:

andrwng
andrwng previously approved these changes Feb 21, 2025
Copy link
Contributor

@andrwng andrwng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! Thanks for pulling this out

@@ -0,0 +1,24 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Curious, what's the intuition for things that go in here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not 100% on details to give a one sentence description, but the bulk of it is here: 82bcd17

@@ -303,11 +303,6 @@ partition_translator::do_translation_for_range(
const auto& ntp = _partition->ntp();
auto remote_path_prefix = remote_path{
fmt::format("{}/{}/{}", iceberg_data_path_prefix, ntp.path(), _term)};
lazy_abort_source las{[this] {
return can_continue() ? std::nullopt
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious, this used to account for term changes. Is that still covered with _as?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the answer is that it'll be up to the scheduler/executor to abort during term changes via this abort source

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah i believe that's right

Copy link
Contributor

@rockwotj rockwotj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, just one comment which could be a followup

Comment on lines +17 to +19
size_t buffered_bytes() const final;

size_t flushed_bytes() const final;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just amend the interface of the parquet writer to be the same? So we don't have to duplicate the state in both layers?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah makes sense to me. will do in a follow up so I can boy scout it a little bit.

rockwotj
rockwotj previously approved these changes Feb 22, 2025
Currently multiplexer is a one shot class with pattern as follows

mux = create_mux();
co_await reader.consume(mux...)

With the new changes, we want multiplexer to multiplex across scheduling
iterations and release resouces inbetween. This commit makes changes to
the API support this port. The new pattern would look something like
this..

mux = create_mux();
mux.multiplex(reader1...)
mux.flush_writers(); // optional
mux.multiplex(reader2..)
mux.flush_writers(); // optional
...
...

result = co_await std::move(mux).finish();

The ability to temporarily flush all the intermediate state and
multiplex across multiple readers enables porting to the new scheduler
API.

Signed-off-by: Oren Leiman <[email protected]>
Make the task long running to support batching of data across multiple
iterations of scheduler

Signed-off-by: Oren Leiman <[email protected]>
@oleiman oleiman dismissed stale reviews from rockwotj and andrwng via ffdcfe0 February 23, 2025 22:14
@oleiman oleiman force-pushed the dlib/core-8933/translator-scheduling-port-foundation branch from f2b3275 to ffdcfe0 Compare February 23, 2025 22:14
@oleiman oleiman requested a review from andrwng February 24, 2025 05:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants