-
Notifications
You must be signed in to change notification settings - Fork 466
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Latency slowly creeps up when running same few SELECTs against a cluster #29526
Comments
One thing that came to mind is that @antiguru mentioned that the query history currently doesn't support running many queries against Materialize (because it eventually OoMs), maybe that's related? I'm not even able to load the query history anymore now |
Could we run this against a staging environment that has statement logging disabled? This way we could unblock the test, which is important to have on its own. |
Yes, that will be my next step, I was hoping not to have to this early because of cost and for convenience of having to recreate sources etc. |
What environment is the test running against now? We could disable statement logging against that environment instead. |
Oh, just saw the conversation on Slack. Never mind. |
What version of Materialize are you using?
v0.116.0
What is the issue?
Seen in https://buildkite.com/materialize/qa-canary/builds/228#0191eb33-4b0c-406a-af5e-1e5bf13c4413 on my PR introducing that test: #29524
This is running a few simple
SELECT
queries against a cluster. Only theSELECT 1
is open loop with 100 queries per second (not affected), while the rest are closed loop (and strict serializable) and are getting slower slowly with time:The workload is running against the Materialize Production Sandbox (maybe I should move it to a dedicated staging env to be more isolated from other noise?), and since the first attempt was only 10 minutes I'm now retrying with 1 hour: https://buildkite.com/materialize/qa-canary/builds/229
The cluster itself (200cc, https://console.materialize.com/regions/aws-us-east-1/clusters/u3/qa_canary_environment_compute?timePeriod=180) always stayed at <=50% CPU usage. Since it's not overloaded, I expected the queries' performance to stay consistent over time.
The text was updated successfully, but these errors were encountered: