Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add cross mesh granularity #328

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

skirsten
Copy link

@skirsten skirsten commented Aug 5, 2022

This is a re-implementation of #98. A rebase was not possible so the code was rewritten but without any logic changes.

I deployed it in a test cluster running in addon + flannel mode:

--cni=false --compatibility=flannel --local=false --mesh-granularity=cross

The cluster has 4 nodes in 2 different subnets / regions. As can be seen by these example routes from one of the hosts this is working as expected:

$ ip route get 10.42.0.5
10.42.0.5 via 10.4.0.1 dev kilo0 src 10.4.0.3 uid 1000
$ ip route get 10.42.2.2
10.42.2.2 via 10.4.0.2 dev kilo0 src 10.4.0.3 uid 1000
$ ip route get 10.42.3.2
10.42.3.2 via 10.42.3.0 dev flannel.1 src 10.42.1.0 uid 1000
$ ip route get 10.42.1.3
10.42.1.3 dev cni0 src 10.42.1.1 uid 1000

One problem I found:
If I upgrade a existing kilo deployment from full to cross granularity the flannel routing is broken until I restart the k3s agent.
This might not be related with this code and rather that it was broken by the full granularity.
Rebooting the instances also seems to work so this seems unrelated to me.

I will now implement the graph logic. If somebody can verify and test this in the CNI mode that would be great. I published a docker image here: ghcr.io/skirsten/kilo

@skirsten
Copy link
Author

skirsten commented Aug 5, 2022

I also updated the graph logic to visualize the plain connection.
Unfortunately the circo -Tsvg command (right) that is referred to here in the docs does not visualize the locations in a good way.
On https://dreampuf.github.io/GraphvizOnline (left) it looks much better.

graphviz cluster

If somebody has a larger cluster with interesting topology and cross mesh granularity, they could add a cross.svg to the docs.

@skirsten skirsten marked this pull request as ready for review August 5, 2022 17:52
@skirsten
Copy link
Author

I am now testing this in a 50 node cluster and its working great so far. But the kgctl graph is not working correctly. I will try to fix it.

@squat
Copy link
Owner

squat commented Aug 10, 2022

Amazing stuff @skirsten! I think this will be a great feature for Kilo. We'll need to add more tests before releasing, but that should not hold things up

@sepich
Copy link

sepich commented Jun 7, 2024

Consider the case of typical "enterprise" LAN, with many Offices and firewalled subnets. We have IoT k8s nodes of same cluster, randomly scattered via these subnets. All nodes only have internal IPs, no public interfaces, no cloud.
The idea of --mesh-granularity=cross sounds interesting, so I've tested it for such cluster.

Unfortunately:

  • "Same location" nodes try to install direct route to neighbor podCIDRs (with --encapsulate=never)
    routes = append(routes, encapsulateRoute(&netlink.Route{

    But that only possible if nodes are in same subnet. So we have to watch for that and label "location" by subnet in each location. In our case it is not static, as nodes could be re-connected to another subnet, at this point label would be out-of-date.
  • mesh hostNet traffic is not routed to wireguard
    if segment.privateIPs == nil || segment.privateIPs[i].Equal(t.updateEndpoint(segment.endpoint, segment.key, &segment.persistentKeepalive).IP()) {

    Nodes have some Pods in hostNet mode, like node-exporter. To scrape it, some Pod in cluster should be able now to connect to each node :9100 port. With cilium-agent on nodes, you also need :9099, :9965. And so on. Usually it is hard to open new ports in firewall in such microsegmented nets, especially in mesh all-to-all mode. Would be nice to move wireguard to something already opened (like udp/53) and only require single port for k8s nodes. That could be done by https://www.wireguard.com/netns/#improved-rule-based-routing I.e route all traffic to another node to kilo0, but wireguard traffic to it is routed via default gw. This way node should only have access to kube-api LB. And even kube-api->kubelet:10250 connections would go inside the kilo.

So, I've end up with implementing --mesh-granularity=subnet based on this work. You can find changes here: sepich@6dd5170
I believe "location" label from cloud-provider looks pretty, but has no sense in on-prem. And automatic detection of subnets is more practical. Even useful for cloud raw EC2/VM nodes (without GCP vpc-native nodes, or EKS vpc-cni)
Maybe some parts of this, like "route hostNet traffic to wg" is useful for others. I can send as a separate PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants