Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: track latest reported status and report on reconnect #30576

Conversation

aljoscha
Copy link
Contributor

@aljoscha aljoscha commented Nov 20, 2024

Before, we would purely pump upgrades from running dataflows over to the
controller. Which would make it so that when a controller re-connects it
doesn't get status messages for the latest state.

Now, we track the latest status update, per object, and also what has
been reported, per object. And on re-connect we clear out what has been
reported, which makes it so that we report the latest status when a
reconnect happens.

Motivation

Tips for reviewer

Checklist

  • This PR has adequate test coverage / QA involvement has been duly considered. (trigger-ci for additional test/nightly runs)
  • This PR has an associated up-to-date design doc, is a design doc (template), or is sufficiently small to not require a design.
  • If this PR evolves an existing $T ⇔ Proto$T mapping (possibly in a backwards-incompatible way), then it is tagged with a T-proto label.
  • If this PR will require changes to cloud orchestration or tests, there is a companion cloud PR to account for those changes that is tagged with the release-blocker label (example).
  • If this PR includes major user-facing behavior changes, I have pinged the relevant PM to schedule a changelog post.

@aljoscha aljoscha force-pushed the storage-report-latest-status-on-reconnect branch 2 times, most recently from 3947a06 to c2840f6 Compare November 20, 2024 18:29
@aljoscha aljoscha force-pushed the storage-report-latest-status-on-reconnect branch from c2840f6 to de6c0ab Compare February 5, 2025 16:43
Before, we would purely pump upgrades from running dataflows over to the
controller. Which would make it so that when a controller re-connects it
doesn't get status messages for the latest state.

Now, we track the latest status update, per object, and also what has
been reported, per object. And on re-connect we clear out what has been
reported, which makes it so that we report the latest status when a
reconnect happens.
@aljoscha aljoscha force-pushed the storage-report-latest-status-on-reconnect branch from de6c0ab to 6ae2711 Compare February 6, 2025 15:17
@aljoscha aljoscha marked this pull request as ready for review February 6, 2025 15:19
@aljoscha aljoscha requested a review from a team as a code owner February 6, 2025 15:19
@@ -1288,6 +1340,8 @@ impl StorageState {
self.ingestions.remove(&id);
self.exports.remove(&id);

let _ = self.reported_status_updates.remove(&id);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't we need to remove the latest_status_updates entry too?

}

if to_report.len() > 0 {
self.send_storage_response(response_tx, StorageResponse::StatusUpdates(to_report));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A place that's going to conflict with #31261 (fyi @petrosagg)

Comment on lines 323 to 326
/// The latest status update that has been _reported_ back to the
/// controller. This will be reset when a new client connects, so that we
/// can determine what updates we have to report again.
pub reported_status_updates: BTreeMap<GlobalId, StatusUpdate>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The existence of this map surprises me! Naively I'd have thought that we can just unconditionally send the contents of latest_status_updates on reconnect. Or, if that's somehow hard to set up, wouldn't an initial_status_reported flag suffice?

@aljoscha aljoscha added the release-blocker Critical issue that should block *any* release if not fixed label Feb 7, 2025
@aljoscha
Copy link
Contributor Author

aljoscha commented Feb 7, 2025

@teskje I updated with your suggestions, ptal. 🙏

@aljoscha aljoscha requested a review from teskje February 7, 2025 13:59
Copy link
Contributor

@teskje teskje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@aljoscha aljoscha merged commit 702e062 into MaterializeInc:main Feb 10, 2025
78 checks passed
@aljoscha aljoscha deleted the storage-report-latest-status-on-reconnect branch February 10, 2025 10:38
@aljoscha
Copy link
Contributor Author

Thanks for the speedy review/iteration on this one! 🙇‍♂️

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
release-blocker Critical issue that should block *any* release if not fixed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants