-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
network: add support for tcp keepalive probes #3096
Conversation
Configuration example:
|
Debug log:
|
Valgrind report:
|
Sample extracted using wireshark (configuration was 15 seconds during this test):
|
src/flb_io.c
Outdated
@@ -102,6 +102,13 @@ int flb_io_net_connect(struct flb_upstream_conn *u_conn, | |||
u_conn->fd, u->tcp_host, u->tcp_port); | |||
} | |||
|
|||
/* set TCP keepalive if configured */ | |||
ret = flb_net_socket_tcp_keepalive(fd, &u->net); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this line is changing the default system values, I think this call must be optional based on if the user set some parameters. I mean: all the internal tcp_something fields should have a non-set value like -1
and IF in the invoked call that value is >= 0, then perform the syscall to change the behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
changed. Added a boolean: net.tcp_keepalive
, that allows enabling keepalives while keeping the system-wide default values
src/flb_upstream.c
Outdated
@@ -65,6 +65,25 @@ struct flb_config_map upstream_net[] = { | |||
"before it is retired." | |||
}, | |||
|
|||
{ | |||
FLB_CONFIG_MAP_INT, "net.tcp_keepalive_time", "0", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think default values should be avoided since every system can have an implicit default that might not match this 0
. I am thinking mostly on current users that my face a different internal setup due to this change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Replaced with -1, and as indicated in other comment, the system-wide default will be then used if the configuration does not specify a desired value
Signed-off-by: Abilio Marques <[email protected]>
f9ac8e4
to
58be375
Compare
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
I have some need for this particular feature, presuming it can work with Sidebar: though perhaps more suitable for 1.9 given the change level, for reasons like bug #4606, I think it's a bad idea to not have keepalive set when doing synchronous I/O unless some unusual things are done to time out blocking system calls. i.e., there should be a default. |
@abiliojr @edsiper Can we resurrect this PR? I am hoping it can fix this issue: aws/aws-for-fluent-bit#293 Do you have recommended values for the settings? |
This PR is stale because it has been open 45 days with no activity. Remove stale label or comment or this will be closed in 10 days. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With those minimal changes this should be ready to be merged.
int intvl = net->tcp_keepalive_interval; | ||
int probes = net->tcp_keepalive_probes; | ||
|
||
printf("keepalive = %d, %d %d\n", time, intvl, probes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove this line.
/* | ||
* Enable TCP keepalive | ||
*/ | ||
int flb_net_socket_tcp_keepalive(flb_sockfd_t fd, struct flb_net_setup *net) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please replace the bitwise assignments of ret
with regular assignments and add ret == 0 &&
as a prefix to the conditional blocks so a failed setsockopt
call is not overwritten and thus missed by the conditional in line 221.
Please change the conditional in line 221 so it explicitly compares against zero (ie. if (ret != 0) {
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is important because apparently according to msdn, windows doesn't accept values larger than 255
for TCP_KEEPCNT
whereas other operating systems accept much larger values,.
@@ -102,6 +102,15 @@ int flb_io_net_connect(struct flb_upstream_conn *u_conn, | |||
u_conn->fd, u->tcp_host, u->tcp_port); | |||
} | |||
|
|||
/* set TCP keepalive and it's options */ | |||
if (u->net.tcp_keepalive != FLB_FALSE) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please either use if (u->net.tcp_keepalive == FLB_TRUE) {
or the preferred syntax according to the coding style if (u->net.tcp_keepalive) {
.
@@ -43,6 +43,18 @@ struct flb_net_setup { | |||
|
|||
/* maximum of times a keepalive connection can be used */ | |||
int keepalive_max_recycle; | |||
|
|||
/* enable/disable tcp keepalive */ | |||
char tcp_keepalive; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This property needs to be an int
@abiliojr if you are not able to make those changes please let me know and I'll make them for you. Thanks a lot! |
@leonardo-albertovich , I haven't used fluentbit for at least a couple of years. Please go ahead and make any needed changes. |
Closing this as changes are part of #9249 . |
Signed-off-by: Abilio Marques [email protected]
The term TCP keepalive is been widely used in networking to describe a mechanism totally different than just "not closing the connection after a request". This PR adds support for real TCP keepalive probing. A matching PR with suggestions on how to make these changes clearer in the documentation is also available.
TCP keepalives have 2 potentially useful applications:
All happens at the transport layer, without any intervention, and is transparent to upper (e.g., HTTP) layers.
For a more detailed description, please see: https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/overview.html
TCP-keepalive values could be configured system wide, but this is definitely not ideal, as it still needs to be enabled by the application. The other option is to do it per socket. For that, fluentbit needs to apply the values to the socket. This patch adds support for that. By default, it's disabled.
Enter
[N/A]
in the box, if an item is not applicable to your change.Testing
Before we can approve your change; please submit the following in a comment:
Documentation
Documentation provided here: fluent/fluent-bit-docs#471
Fluent Bit is licensed under Apache 2.0, by submitting this pull request I understand that this code will be released under the terms of that license.