Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Telemetry Agent not closing connections, causing MySQL error 08004/1040: Too many connections after some time. #123

Open
Concera-StefanvanBuren opened this issue Jan 23, 2025 · 0 comments

Comments

@Concera-StefanvanBuren
Copy link

It looks like the Percona Telemetry Agent is not closing connections, causing MySQL error 08004/1040: Too many connections after some time.

Operating system: Debian 12
Database: Percona Server 8.0.37 for Linux
Configuration: Access for the root-user on localhost is disabled. The only way for the root user to login onto the database is specifying 127.0.0.1 as the host, like this: mysql -h 127.0.0.1 -u root -p. We don't use user root anywhere in our application.

Searhing the syslog, we've found the rules, logged by the percona_telemetry_agent, runned once a day:

/var/log/syslog.3.gz:Jan 20 20:15:00 mysqld[1694]: 2025-01-20T20:15:00.098778Z 0 [Warning] [MY-011071] [Server] Component percona_telemetry reported: 'Collecting db_instance_id failed. It may be caused by server still initializing.'
/var/log/syslog.4.gz:Jan 19 20:15:00 mysqld[1694]: 2025-01-19T20:15:00.098412Z 0 [Warning] [MY-011071] [Server] Component percona_telemetry reported: 'Collecting db_instance_id failed. It may be caused by server still initializing.'
/var/log/syslog.5.gz:Jan 18 20:15:00 mysqld[1694]: 2025-01-18T20:15:00.098010Z 0 [Warning] [MY-011071] [Server] Component percona_telemetry reported: 'Collecting db_instance_id failed. It may be caused by server still initializing.'
/var/log/syslog.6.gz:Jan 17 20:15:00 mysqld[1694]: 2025-01-17T20:15:00.097674Z 0 [Warning] [MY-011071] [Server] Component percona_telemetry reported: 'Collecting db_instance_id failed. It may be caused by server still initializing.'
/var/log/syslog.7.gz:Jan 16 20:15:00 mysqld[1694]: 2025-01-16T20:15:00.097363Z 0 [Warning] [MY-011071] [Server] Component percona_telemetry reported: 'Collecting db_instance_id failed. It may be caused by server still initializing.'

When using systemctl status mysql, the Percona Service is active (running) with status "Server is operational". After some time (couple of weeks/months), the MySQL processlist is filling up with processes with state "login" and info "PLUGIN", see also the output below and the attached screenshot.

+----------+----------------------+-----------------+----------------------+---------+--------+------------------------+------------------+-----------+-----------+---------------+
| Id       | User                 | Host            | db                   | Command | Time   | State                  | Info             | Time_ms   | Rows_sent | Rows_examined |
+----------+----------------------+-----------------+----------------------+---------+--------+------------------------+------------------+-----------+-----------+---------------+
| 6        | event_scheduler      | localhost       |                      | Daemon  | 709122 | Waiting on empty queue |                  | 709122565 | 0         | 0             |
| 3740365  | root                 | localhost       |                      | Sleep   | 622722 | login                  | PLUGIN           | 622722577 | 0         | 0             |
| 7453355  | root                 | localhost       |                      | Sleep   | 536322 | login                  | PLUGIN           | 536322576 | 0         | 0             |
| 11165124 | root                 | localhost       |                      | Sleep   | 449922 | login                  | PLUGIN           | 449922576 | 0         | 0             |
| 14877373 | root                 | localhost       |                      | Sleep   | 363522 | login                  | PLUGIN           | 363522575 | 0         | 0             |
| 18589114 | root                 | localhost       |                      | Sleep   | 277122 | login                  | PLUGIN           | 277122575 | 0         | 0             |
| 22302053 | root                 | localhost       |                      | Sleep   | 190722 | login                  | PLUGIN           | 190722575 | 0         | 0             |
| 26013793 | root                 | localhost       |                      | Sleep   | 104322 | login                  | PLUGIN           | 104322574 | 0         | 0             |
| 29724803 | root                 | localhost       |                      | Sleep   | 17922  | login                  | PLUGIN           | 17922574  | 0         | 0             |
| 30494127 | root                 | localhost:57288 |                      | Query   | 0      | init                   | show processlist | 0         | 0         | 0             |
+----------+----------------------+-----------------+----------------------+---------+--------+------------------------+------------------+-----------+-----------+---------------+

Image

It looks like the percona_telemetry_agent creates a new process once a day (the difference between the values in column Time from the processlist is exactly 86400 seconds), but is not closing it properly.

With the default setting for max_connections (151), after 151 days, the processlist is completely full and new connections are refused. The only solutions seems to restart the MySQL server, but after some time, the same behaviour occurs again.

Output from /var/log/percona/telemetry-agent/telemetry-agent.log:

{"level":"info","ts":"2025-01-19T20:12:59.621Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:279","msg":"start metrics processing iteration"}
{"level":"info","ts":"2025-01-19T20:12:59.621Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:281","msg":"cleaning up history metric files","directory":"/usr/local/percona/telemetry/history"}
{"level":"info","ts":"2025-01-19T20:12:59.621Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:287","msg":"processing Pillars metrics files"}
{"level":"info","ts":"2025-01-19T20:12:59.621Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:92","msg":"processing PS metrics","directory":"/usr/local/percona/telemetry/ps"}
{"level":"info","ts":"2025-01-19T20:12:59.621Z","logger":"telemetry-agent","caller":"metrics/metrics.go:59","msg":"pillar metric directory is empty, skipping","directory":"/usr/local/percona/telemetry/ps"}
{"level":"info","ts":"2025-01-19T20:12:59.621Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:99","msg":"processing PXC metrics","directory":"/usr/local/percona/telemetry/pxc"}
{"level":"info","ts":"2025-01-19T20:12:59.621Z","logger":"telemetry-agent","caller":"metrics/metrics.go:49","msg":"pillar metric directory is absent, skipping","directory":"/usr/local/percona/telemetry/pxc"}
{"level":"info","ts":"2025-01-19T20:12:59.621Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:106","msg":"processing PSMDB (mongod) metrics","directory":"/usr/local/percona/telemetry/psmdb"}
{"level":"info","ts":"2025-01-19T20:12:59.621Z","logger":"telemetry-agent","caller":"metrics/metrics.go:49","msg":"pillar metric directory is absent, skipping","directory":"/usr/local/percona/telemetry/psmdb"}
{"level":"info","ts":"2025-01-19T20:12:59.621Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:113","msg":"processing PSMDB (mongos) metrics","directory":"/usr/local/percona/telemetry/psmdbs"}
{"level":"info","ts":"2025-01-19T20:12:59.621Z","logger":"telemetry-agent","caller":"metrics/metrics.go:49","msg":"pillar metric directory is absent, skipping","directory":"/usr/local/percona/telemetry/psmdbs"}
{"level":"info","ts":"2025-01-19T20:12:59.621Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:120","msg":"processing PG metrics","directory":"/usr/local/percona/telemetry/pg"}
{"level":"info","ts":"2025-01-19T20:12:59.621Z","logger":"telemetry-agent","caller":"metrics/metrics.go:49","msg":"pillar metric directory is absent, skipping","directory":"/usr/local/percona/telemetry/pg"}
{"level":"info","ts":"2025-01-19T20:12:59.621Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:135","msg":"no Pillar metrics files found, skip scraping host metrics and sending telemetry"}
{"level":"info","ts":"2025-01-19T20:12:59.621Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:289","msg":"sleep for 86400 seconds"}
{"level":"info","ts":"2025-01-20T20:12:59.620Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:279","msg":"start metrics processing iteration"}
{"level":"info","ts":"2025-01-20T20:12:59.620Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:281","msg":"cleaning up history metric files","directory":"/usr/local/percona/telemetry/history"}
{"level":"info","ts":"2025-01-20T20:12:59.620Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:287","msg":"processing Pillars metrics files"}
{"level":"info","ts":"2025-01-20T20:12:59.620Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:92","msg":"processing PS metrics","directory":"/usr/local/percona/telemetry/ps"}
{"level":"info","ts":"2025-01-20T20:12:59.620Z","logger":"telemetry-agent","caller":"metrics/metrics.go:59","msg":"pillar metric directory is empty, skipping","directory":"/usr/local/percona/telemetry/ps"}
{"level":"info","ts":"2025-01-20T20:12:59.620Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:99","msg":"processing PXC metrics","directory":"/usr/local/percona/telemetry/pxc"}
{"level":"info","ts":"2025-01-20T20:12:59.620Z","logger":"telemetry-agent","caller":"metrics/metrics.go:49","msg":"pillar metric directory is absent, skipping","directory":"/usr/local/percona/telemetry/pxc"}
{"level":"info","ts":"2025-01-20T20:12:59.620Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:106","msg":"processing PSMDB (mongod) metrics","directory":"/usr/local/percona/telemetry/psmdb"}
{"level":"info","ts":"2025-01-20T20:12:59.620Z","logger":"telemetry-agent","caller":"metrics/metrics.go:49","msg":"pillar metric directory is absent, skipping","directory":"/usr/local/percona/telemetry/psmdb"}
{"level":"info","ts":"2025-01-20T20:12:59.620Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:113","msg":"processing PSMDB (mongos) metrics","directory":"/usr/local/percona/telemetry/psmdbs"}
{"level":"info","ts":"2025-01-20T20:12:59.620Z","logger":"telemetry-agent","caller":"metrics/metrics.go:49","msg":"pillar metric directory is absent, skipping","directory":"/usr/local/percona/telemetry/psmdbs"}
{"level":"info","ts":"2025-01-20T20:12:59.620Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:120","msg":"processing PG metrics","directory":"/usr/local/percona/telemetry/pg"}
{"level":"info","ts":"2025-01-20T20:12:59.620Z","logger":"telemetry-agent","caller":"metrics/metrics.go:49","msg":"pillar metric directory is absent, skipping","directory":"/usr/local/percona/telemetry/pg"}
{"level":"info","ts":"2025-01-20T20:12:59.620Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:135","msg":"no Pillar metrics files found, skip scraping host metrics and sending telemetry"}
{"level":"info","ts":"2025-01-20T20:12:59.620Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:289","msg":"sleep for 86400 seconds"}gger":"telemetry-agent","caller":"telemetry-agent/main.go:106","msg":"processing PSMDB (mongod) metrics","directory":"/usr/local/percona/telemetry/psmdb"}
{"level":"info","ts":"2025-01-22T13:47:09.830Z","logger":"telemetry-agent","caller":"metrics/metrics.go:49","msg":"pillar metric directory is absent, skipping","directory":"/usr/local/percona/telemetry/psmdb"}
{"level":"info","ts":"2025-01-22T13:47:09.830Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:113","msg":"processing PSMDB (mongos) metrics","directory":"/usr/local/percona/telemetry/psmdbs"}
{"level":"info","ts":"2025-01-22T13:47:09.830Z","logger":"telemetry-agent","caller":"metrics/metrics.go:49","msg":"pillar metric directory is absent, skipping","directory":"/usr/local/percona/telemetry/psmdbs"}
{"level":"info","ts":"2025-01-22T13:47:09.830Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:120","msg":"processing PG metrics","directory":"/usr/local/percona/telemetry/pg"}
{"level":"info","ts":"2025-01-22T13:47:09.830Z","logger":"telemetry-agent","caller":"metrics/metrics.go:49","msg":"pillar metric directory is absent, skipping","directory":"/usr/local/percona/telemetry/pg"}
{"level":"info","ts":"2025-01-22T13:47:09.830Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:135","msg":"no Pillar metrics files found, skip scraping host metrics and sending telemetry"}
{"level":"info","ts":"2025-01-22T13:47:09.830Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:289","msg":"sleep for 86400 seconds"}
{"level":"info","ts":"2025-01-23T10:51:03.954Z","logger":"telemetry-agent","caller":"utils/signal_runner.go:36","msg":"Received signal: terminated, shutdown"}
{"level":"info","ts":"2025-01-23T10:51:03.954Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:273","msg":"terminating main loop"}
{"level":"info","ts":"2025-01-23T10:51:03.954Z","logger":"telemetry-agent","caller":"telemetry-agent/main.go:298","msg":"finished"}

For now, we've disabled the telemetry-agent using systemctl stop percona-telemetry-agent and systemctl disable percona-telemetry-agent in order to prevent it to fill up the processlist .

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant