-
Notifications
You must be signed in to change notification settings - Fork 596
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
akka.http.impl.engine.client.OutgoingConnectionBlueprint$UnexpectedConnectionClosureException: The http server closed the connection unexpectedly before delivering responses for 1 outstanding requests #3481
Comments
Thanks for the report, @MartinMajor.
According to this line:
it failed only after 166ms so that would allow a round-trip to the server I guess. Where do you get that 1ms measurement from? The most likely explanation would be that the server closed the connection without setting |
The 0-1ms measurement was the most usual time when we had debug level disabled and it was the time between calling We are seeing this behavior both when calling external services and when we call other internal services using akka-http as a server (routed through nginx proxy). Your explanation makes sense but the weird thing is that we didn't see this before upgrading to akka-http 10.2.0.
Are POST requests somehow special? Because we see the same problems with GETs as well. Is there some way how can I confirm that this is the case? According to documentation client-side pipelining is not currently supported. Can I force using new connection for each request? |
Logging is quite extensive so it might change timing a bit but it's not completely clear how it should mess up things to this degree. Could you try capturing a trace with tcpdump / wireshark for this issue? With that we could pinpoint the issue pretty clearly between server and client.
If you have access to the proxy maybe you could check if there are any settings that would limit the life-time of persistent connections to the backend. Common options might be a maximum number of requests to run per connection (though, in that case a
It could be a timing issue or a coincidence or a bug. But it's hard to say without more information.
POST requests cannot be easily retried. GET requests will just be retried by default if they fail (because it's such a common scenario that a server closes a connection after a while).
It's not about pipelining but about persistent connections which are really important for throughput. Using a new connection for each request means giving up these throughput benefits. https://doc.akka.io/docs/akka-http/current/client-side/connection-level.html#opening-http-connections shows how to run requests through a connection manually. You could replace |
Hi, thank you for the response! These are two examples of requests that failed much faster:
I tried to run the application from localhost with wireshark but the issue doesn't happen so often so I didn't catch one. I might try to deploy wireshark to production for a while but it will take some time to test that properly because I don't want to break the production.
I haven't found anything suspicious (which proves nothing - I might just overlooked that). I will test that even more.
Isn't possible to retry even non-GET request when it is detected that connection was broken before request is send? |
Hi, I was able to reproduce the issue while recording communication and I found segment, that I think is weird. Note: This was the first time that I used Wireshark and I'm noob in TCP therefore I might be completely wrong.
|
Thanks for sharing that @MartinMajor. In this case, it looks like the server could have known about the connection closure, but it is a race condition in any case. One problem is that there are several layers involved in transporting a connection close signal to the pool. E.g. the ACKs are directly sent by the kernel without the application seeing anything about that. The application sees that the peer has closed the connection only when it reads from the connection the next time. But even then the information needs to trickle up to the pool first before the pool knows a connection has been closed. For that reason, HTTP defines that a server should set a |
I just checked that at least in the latest version things work as expected. So, with a response |
Thanks for the explanation! I've checked the Wireshark logs and server indeed sent Do you have any idea why this started to happen after upgrade from akka-http 10.1.11 -> 10.2.0 ? I agree that server's behavior is wrong but it would be nice if akka-http could handle that anyway 😉 |
Might be just a timing issue.
There's no way to handle all cases because it's a timing issue and we cannot guess what the server is going to do. |
I've dug a little bit and I've been able to trace the problem to its roots: We are using nginx as a load-balancer in front of our backend services. When some service starts or stops it is registered to Consul and then using Consul-template we generate new nginx configuration and gracefully reload nginx. When this happen, nginx keeps running old requests but use new configuration for new requests. Unfortunately there is not much to do with keep-alive connections that currently don't handle any requests so it closes them. This causes that last HTTP message contains Kubernetes dealt with similar problem and they solved it by using dynamic nginx configuration (using Lua) that doesn't require reload. We'll probably use that solution too. For me, akka-http behaves correctly according to RFC. But when RFC allows terminating TCP connection during HTTP request establishing, akka-http should throw some specific exception in this case with good name allowing user to handle it. Currently it is generic exception that is moreover in private class thus it cannot be handled. |
Thanks for digging into it, @MartinMajor. That sounds like a very reasonable description.
Indeed. HTTP/2 has a slightly improved mechanism where you can at least give a hint at which request was the last one that has been fully processed.
Yes, would be good to move the exception so it can be more easily handled. Not sure though about what more concrete information you see missing? |
This is being tracked in #768 but we couldn't yet get a consensus about how the public exception hierarchy would look like. |
When I think about it, the exception is fine. I would just add a scaladoc that would explain more thoroughly TCP & HTTP details.
I understand that it might not be an easy decision but it should be done somehow because according to RFC client code is responsible to handle these situations therefore it needs to handle that exception. Thanks a lot @jrudolph for all the information! |
We added a new pool We haven't yet improved the error message. One problem is that there are multiple ways the error message could play out depending on the exact timing. I'm leaving this ticket open for improvements in this regard. |
I also filed a new ticket for a potential alternative solution: #3834. |
Is this problem introduced with 10.1.12 version where there were changes around http2 handling? Can you please verify this? |
We are using http/1.1 so if those changes were for http2 only it shouldn't be linked. We started to notify this problem after the upgrade but it might just be a coincidence. I can't easily test it now with the old version, sorry :( |
@MartinMajor what was the fix for this issue? |
Hello Hakkers,
Our stack:
After upgrading from akka-http 10.1.11 we started to randomly get
akka.http.impl.engine.client.OutgoingConnectionBlueprint$UnexpectedConnectionClosureException: The http server closed the connection unexpectedly before delivering responses for 1 outstanding requests
exceptions. It happens on average 1x per 5000 requests. It is unlikely that it has anything to do with the target server, because exception is usually returned 1ms after calling that http request therefore there is no time to even ping target server.According to logs it usually happens when some request ends and another follows directly after the previous one.
I've read this issue so I tried to enable DEBUG for connection pool and this is one occurrence:
Code that do the call can be simplified to this:
and the settings are:
Thank you for any help!
The text was updated successfully, but these errors were encountered: