Support k3s-agent deployment #128

jvassev · 2024-05-13T16:59:10Z

No description provided.

ctalledo

Hi @jvassev, thanks for the contribution, looks good to me. Just one minor request (see comments below).

Thanks!

ctalledo · 2024-05-16T08:45:37Z

k8s/scripts/kubelet-config-helper.sh

@@ -1509,6 +1533,8 @@ function main() {
 		do_config_kubelet_rke2
 	elif kubelet_docker_systemd_deployment; then
 		do_config_kubelet_docker_systemd
+	elif kubelet_k3s_deploymet; then


Please add comment after line 1526 above.

jvassev · 2024-05-17T15:20:11Z

I discovered a few more missing pieces and added them too.
There is a strange issue when pods get rescheduled on crio-o where I occasionally see:

level=error err="listen tcp :9100: bind: address already in use"

Simple pod recreation solves it. That's why I'm adding sleep 20 between k3s-agent restarting.

Is there a smarter way to solve this?

ctalledo

LGTM ...

ctalledo · 2024-05-19T23:04:27Z

k8s/scripts/kubelet-config-helper.sh

@@ -1500,7 +1534,7 @@ function main() {
 	# * RKE2: Host-based kubelet managed by rke2-agent's systemd service (Rancher's RKE2 approach).
 	# * Systemd+Docker: Docker-based kubelet managed by a systemd service (Lokomotive's approach).
 	# * Systemd: Host-based kubelet managed by a systemd service (most common approach).
-	#
+	# * Systemd: k3s when run as an agent (k3s-agent.service), if k3s is run as controlplane + node (k3s.service) it will not work


Thanks; to avoid having two "systemd" entries, I would re-word as: "k3s: when run as agent only (if k3s is run as control plane + node (i.e., k3s.service) it won't work)."

ctalledo · 2024-05-19T23:10:10Z

I discovered a few more missing pieces and added them too. There is a strange issue when pods get rescheduled on crio-o where I occasionally see:
level=error err="listen tcp :9100: bind: address already in use"
Simple pod recreation solves it. That's why I'm adding sleep 20 between k3s-agent restarting.

Is there a smarter way to solve this?

I don't know, but it's certainly not ideal.

Is it because the k3s agent is not fully stopped after systemctl stop k3s-agent? If that's the case, then we could try looping until it is. Do you know what agent is using that tcp 9100 port?

jvassev · 2024-05-20T16:17:02Z

In my case, it was node-exporter but it happens on other pods like calico-node. I'm sure systemctl stop k3s-agent blocks until the process is down.
Maybe the containerd-manged pods need to get wiped out too? I see this in do_config_kubelet_docker_systemd:
https://github.com/nestybox/sysbox-pkgr/blob/master/k8s/scripts/kubelet-config-helper.sh#L1396-L1401

jvassev · 2024-05-21T11:03:17Z

With that latest change I think the pod running the kubelet-config-helper.sh scripts is stopped because of the call to clean_runtime_state "$runtime" and the final systemctl start k3s-agent never has a chance to run.
Starting it manually fixes the node

jvassev · 2024-05-23T12:00:13Z

After some more debugging i noticed that it just takes too long to kill the old *.slice units.
So the last change is to stop them in parallel.

ctalledo · 2024-05-23T22:42:56Z

With that latest change I think the pod running the kubelet-config-helper.sh scripts is stopped because of the call to clean_runtime_state "$runtime" and the final systemctl start k3s-agent never has a chance to run. Starting it manually fixes the node

Mmm ... not sure about this. The kubelet-config-helper.sh does not run within a pod; it runs directly on the host (i.e., k8s node) as a systemd unit. The systemd unit is created and then started by the sysbox-deploy-k8s.sh script running inside the sysbox-deploy-k8s pod.

Thus the call to clean_runtime_state should not affect the execution of the kubelet-config-helper.sh. Maybe something else is going on?

ctalledo · 2024-05-23T22:45:21Z

Hi @jvassev, thanks again for the contribution.

Where is this PR at? Is is ready for merging or are you still debugging/testing it?

Thanks!

ctalledo approved these changes May 16, 2024

View reviewed changes

jvassev force-pushed the k3s-support branch from aa3678c to a729717 Compare May 17, 2024 15:16

ctalledo approved these changes May 19, 2024

View reviewed changes

jvassev force-pushed the k3s-support branch from a729717 to 2ea4902 Compare May 21, 2024 10:18

jvassev force-pushed the k3s-support branch from 2ea4902 to e89a0f2 Compare May 21, 2024 11:05

Support k3s-agent deployment

a7ffdfd

jvassev force-pushed the k3s-support branch from e89a0f2 to a7ffdfd Compare May 23, 2024 11:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support k3s-agent deployment #128

Support k3s-agent deployment #128

jvassev commented May 13, 2024

ctalledo left a comment •

edited

Loading

ctalledo May 16, 2024

jvassev commented May 17, 2024

ctalledo left a comment

ctalledo May 19, 2024

ctalledo commented May 19, 2024

jvassev commented May 20, 2024

jvassev commented May 21, 2024

jvassev commented May 23, 2024

ctalledo commented May 23, 2024

ctalledo commented May 23, 2024

Support k3s-agent deployment #128

Are you sure you want to change the base?

Support k3s-agent deployment #128

Conversation

jvassev commented May 13, 2024

ctalledo left a comment • edited Loading

Choose a reason for hiding this comment

ctalledo May 16, 2024

Choose a reason for hiding this comment

jvassev commented May 17, 2024

ctalledo left a comment

Choose a reason for hiding this comment

ctalledo May 19, 2024

Choose a reason for hiding this comment

ctalledo commented May 19, 2024

jvassev commented May 20, 2024

jvassev commented May 21, 2024

jvassev commented May 23, 2024

ctalledo commented May 23, 2024

ctalledo commented May 23, 2024

ctalledo left a comment •

edited

Loading