DHCP lease timeout #4

craig-willis · 2017-12-19T16:05:38Z

We're seeing frequent log entries indicating network configuration changes:

Dec 19 14:00:27 host-192-168-149-8 systemd-timesyncd[605]: Network configuration changed, trying to establish connection.
Dec 19 14:00:27 host-192-168-149-8 systemd-timesyncd[605]: Synchronized to time server 129.114.97.2:123 (129.114.97.2).
Dec 19 14:00:47 host-192-168-149-8 systemd-timesyncd[605]: Network configuration changed, trying to establish connection.
Dec 19 14:00:47 host-192-168-149-8 systemd-timesyncd[605]: Synchronized to time server 129.114.97.2:123 (129.114.97.2).
...
Dec 19 14:02:41 host-192-168-149-8 systemd-timesyncd[605]: Network configuration changed, trying to establish connection.
Dec 19 14:02:41 host-192-168-149-8 systemd-timesyncd[605]: Synchronized to time server 129.114.97.2:123 (129.114.97.2).
Dec 19 14:03:00 host-192-168-149-8 systemd-timesyncd[605]: Network configuration changed, trying to establish connection.
Dec 19 14:03:00 host-192-168-149-8 systemd-timesyncd[605]: Synchronized to time server 129.114.97.2:123 (129.114.97.2).

Since we're using Docker swarm, we're also seeing frequent "node join" events as the system responds to the network change.

This may be related to short DHCP lease timeout

$ cat /run/systemd/netif/leases/2
..
MTU=9000
T1=133
T2=245
LIFETIME=300

According to the OS docs, the default value of dhcp_lease_duration is 24 hours.

Confirm with TACC why the lease is so short and consider impacts

The text was updated successfully, but these errors were encountered:

craig-willis · 2017-12-19T20:26:05Z

tickets.xsede.org #80694

DHCP licenses are short primarily because of suspend and migration issues.

Essentially, during either Suspend or non-live Migration, the VM's internal clock stops.
Back in the "real" world, the DHCP server's clock didn't stop.

When the VM resumes, its lease may be expired, but it doesn't think to ask for a new lease until its internal timer goes off, then it renegotiates. If we left it at 24hrs, then we'd either have the pool of IPs used up and/or VMs might wait up to 24 hrs to renegotiate.

Worth noting:

SDSC/Cloud

MTU=1458
T1=78090
T2=142890
LIFETIME=172800

NCSA/Nebula

MTU=1454
T1=40307
T2=72707
LIFETIME=86400

craig-willis · 2017-12-19T20:55:43Z

Comment from SDSC about the high lease time:

dont think we have given it much thought, suspend and non-live migration is not very common at sdsc. Each project has their own subnet and pool of ips so unused ips have not been a concern.
Perhaps if we couldn't live migrate then this would be a concern, but there are very few circumstances where we cant

craig-willis · 2017-12-19T20:57:02Z

@Xarthisius I don't think TACC will change the DHCP lease timeout based on above -- it seems to have been an intentional decision. Do you have any further questions? I expect we should go forward expecting frequent network config changes and swarm join log entries.

craig-willis · 2017-12-19T20:59:43Z

Actually, just had another response from Jetstream:

since you’re already playing with fire, I’ll assume you’re willing to play with explosives as well ;)
If you want to adjust the DHCP life time for an instance, on that instance edit the file (on RHEL anyway) /etc/dhcp/dhclient.conf and add the line
supersede dhcp-lease-time XXXXX;
where XXXXX is the number of seconds you want your lease to have.
Obviously, if you’re unable to communicate with an instance after resuming it, you may have to wait for the lease time to run out.
Please, let us know how that works out,

So perhaps we can override somehow, if needed.

Xarthisius · 2017-12-19T21:00:00Z

No, until we have a concrete issue that this is causing, I don't think we can push them.

craig-willis self-assigned this Dec 19, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DHCP lease timeout #4

DHCP lease timeout #4

craig-willis commented Dec 19, 2017

craig-willis commented Dec 19, 2017 •

edited

Loading

craig-willis commented Dec 19, 2017

craig-willis commented Dec 19, 2017

craig-willis commented Dec 19, 2017

Xarthisius commented Dec 19, 2017

DHCP lease timeout #4

DHCP lease timeout #4

Comments

craig-willis commented Dec 19, 2017

craig-willis commented Dec 19, 2017 • edited Loading

craig-willis commented Dec 19, 2017

craig-willis commented Dec 19, 2017

craig-willis commented Dec 19, 2017

Xarthisius commented Dec 19, 2017

craig-willis commented Dec 19, 2017 •

edited

Loading