-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Retry storm on fetchOlder in message list #945
Labels
Milestone
Comments
gnprice
added
a-msglist
The message-list screen, except what's label:a-content
a-api
Implementing specific parts of the Zulip server API
labels
Sep 17, 2024
PIG208
added a commit
to PIG208/zulip-flutter
that referenced
this issue
Nov 6, 2024
Fixes: zulip#945 Signed-off-by: Zixuan James Li <[email protected]>
PIG208
added a commit
to PIG208/zulip-flutter
that referenced
this issue
Nov 6, 2024
The main purpose of having a separate helper is to wait for the BackoffMachine asynchronously without blocking fetchOlder. This is needed because we have to reset the `_skipFetchingOlder` flag once the network backoff finishes. This approach is different from how BackoffMachine is typically used, because the message list doesn't send and retry requests in a loop. Fixes: zulip#945 Signed-off-by: Zixuan James Li <[email protected]>
PIG208
added a commit
to PIG208/zulip-flutter
that referenced
this issue
Nov 6, 2024
The main purpose of having a separate helper is to wait for the BackoffMachine asynchronously without blocking fetchOlder (so that we can reach the finally block and return). This approach is different from how a BackoffMachine is typically used, because the message list doesn't send and retry requests in a loop; its caller retries rapidly on scroll changes, and we want to ignore the excessive requests. Fixes: zulip#945 Signed-off-by: Zixuan James Li <[email protected]>
PIG208
added a commit
to PIG208/zulip-flutter
that referenced
this issue
Nov 6, 2024
The main purpose of having a separate helper is to wait for the BackoffMachine asynchronously without blocking fetchOlder (so that we can reach the finally block and return). This approach is different from how a BackoffMachine is typically used, because the message list doesn't send and retry requests in a loop; its caller retries rapidly on scroll changes, and we want to ignore the excessive requests. The test drops irrelevant requests with `connection.takeRequests` without checking, as we are only interested in verifying that no request was sent. Fixes: zulip#945 Signed-off-by: Zixuan James Li <[email protected]>
PIG208
added a commit
to PIG208/zulip-flutter
that referenced
this issue
Nov 6, 2024
The main purpose of having a separate helper is to wait for the BackoffMachine without blocking fetchOlder so that we can reach the finally block and return. This approach is different from how a BackoffMachine is typically used, because the message list doesn't send and retry requests in a loop; its caller retries rapidly on scroll changes, and we want to ignore the excessive requests. The test drops irrelevant requests with `connection.takeRequests` without checking, as we are only interested in verifying that no request was sent. Fixes: zulip#945 Signed-off-by: Zixuan James Li <[email protected]>
PIG208
added a commit
to PIG208/zulip-flutter
that referenced
this issue
Nov 6, 2024
The main purpose of having a separate helper is to wait for the BackoffMachine without blocking fetchOlder so that we can reach the finally block and return. This approach is different from how a BackoffMachine is typically used, because the message list doesn't send and retry requests in a loop; its caller retries rapidly on scroll changes, and we want to ignore the excessive requests. The test drops irrelevant requests with `connection.takeRequests` without checking, as we are only interested in verifying that no request was sent. Fixes: zulip#945 Signed-off-by: Zixuan James Li <[email protected]>
PIG208
added a commit
to PIG208/zulip-flutter
that referenced
this issue
Nov 6, 2024
The main purpose of having a separate class is to wait for the BackoffMachine without blocking fetchOlder so that we can reach the finally block and return. This approach is different from how a BackoffMachine is typically used, because the message list doesn't send and retry requests in a loop; its caller retries rapidly on scroll changes, and we want to ignore the excessive requests. The test drops irrelevant requests with `connection.takeRequests` without checking, as we are only interested in verifying that no request was sent. Fixes: zulip#945 Signed-off-by: Zixuan James Li <[email protected]>
PIG208
added a commit
to PIG208/zulip-flutter
that referenced
this issue
Nov 6, 2024
The main purpose of having a separate class is to wait for the BackoffMachine without blocking fetchOlder so that we can reach the finally block and return. This approach is different from how a BackoffMachine is typically used, because the message list doesn't send and retry requests in a loop; its caller retries rapidly on scroll changes, and we want to ignore the excessive requests. The test drops irrelevant requests with `connection.takeRequests` without checking, as we are only interested in verifying that no request was sent. Fixes: zulip#945 Signed-off-by: Zixuan James Li <[email protected]>
PIG208
added a commit
to PIG208/zulip-flutter
that referenced
this issue
Nov 6, 2024
The main purpose of having a separate class is to wait for the BackoffMachine without blocking fetchOlder so that we can reach the finally block and return. This approach is different from how a BackoffMachine is typically used, because the message list doesn't send and retry requests in a loop; its caller retries rapidly on scroll changes, and we want to ignore the excessive requests. The test drops irrelevant requests with `connection.takeRequests` without checking, as we are only interested in verifying that no request was sent. Fixes: zulip#945 Signed-off-by: Zixuan James Li <[email protected]>
PIG208
added a commit
to PIG208/zulip-flutter
that referenced
this issue
Nov 7, 2024
The main purpose of having a separate class is to wait for the BackoffMachine without blocking fetchOlder so that we can reach the finally block and return. This approach is different from how a BackoffMachine is typically used, because the message list doesn't send and retry requests in a loop; its caller retries rapidly on scroll changes, and we want to ignore the excessive requests. The test drops irrelevant requests with `connection.takeRequests` without checking, as we are only interested in verifying that no request was sent. Fixes: zulip#945 Signed-off-by: Zixuan James Li <[email protected]>
PIG208
added a commit
to PIG208/zulip-flutter
that referenced
this issue
Nov 7, 2024
This approach is different from how a BackoffMachine is typically used, because the message list doesn't send and retry requests in a loop; its caller retries rapidly on scroll changes, and we want to ignore the excessive requests. The test drops irrelevant requests with `connection.takeRequests` without checking, as we are only interested in verifying that no request was sent. Fixes: zulip#945 Signed-off-by: Zixuan James Li <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
@alexmv reported the app on his device was sending a rapid series of getMessages requests to the server — around 6 such requests a second.
Because the server rate-limits requests (by default to 200 per minute for each user, total across all types of requests), the user-facing effect was that the requests all failed, with the symptom that scrolling up further in the message list didn't work. The requests also put load on the server, though mitigated by the rate-limiting.
We should fix this.
Diagnosis
From looking at the code, what's happening is:
There's no overt retry logic for that fetch.
But: the thing that triggers that fetch is a listener for the scroll state. It can get called frequently by Flutter's scrolling logic; it's supposed to be lightweight to call repeatedly.
So naturally if we already have one of these fetches in flight, that listener promptly returns. Otherwise if you're near the top of the data we have, and we haven't heard from the server that that's the beginning of history, we make a new /api/get-messages request to fetch more.
And when the fetches are already failing, that amounts to a retry with no backoff.
Implementation
To fix the issue, we'll add backoff to
MessageListView.fetchOlder
, which is the method that invokes the API request.When
fetchOlder
gets an error from the server, it should start a backoff timer (using ourBackoffMachine
so the intervals escalate) and ignore any further calls until the backoff expires, just like it ignores them whenfetchingOlder
is true.We won't add any new retry logic. The
fetchOlder
call site, in_MessageListState._handleScrollMetrics
, already provides plenty of retrying — all the user has to do is touch the screen to scroll slightly up or down, andfetchOlder
will get called again. As Alex saw, it can also get called even with no interaction when nothing else seems to be happening.Out of scope
A further bonus refinement would be for us to understand 429 responses, and wait for the time that that response says:
But to really do that right probably belongs at a different layer, in
ApiConnection
, so that the information is shared across all the different requests we make on behalf of a given account.And conversely I think if we fix the rapid retries, that will largely solve the problem. I suspect the reason we ended up with 429s here in the first place was likely because of too-rapid retries after some other source of error. Even if not, exponential backoff should let us get out of the situation.
The text was updated successfully, but these errors were encountered: