Performance Investigation: CPU Spikes & System Call Overhead in New Relic PHP Agent #1021

theophileds · 2025-02-13T17:18:41Z

Context

Following up on issue #806, we conducted an isolated investigation to better understand the CPU spikes observed when the New Relic PHP agent is enabled. Our testing was performed in a controlled environment with a single container on a dedicated Kubernetes node.

Environment

Kubernetes 1.30 (EKS)
Instance type: m7a
PHP-FPM 8.2
New Relic PHP Agent: Latest version with all features disabled

Findings

CPU Usage Pattern

*Figure 1: Grafana CPU metrics showing distinct usage patterns:

Baseline period with normal activity
Spike to ~100% CPU with New Relic disabled (16:10)
Spike to ~300% CPU with New Relic enabled (16:15)*

Flame Graph Comparison

Figure 2: System-wide flame graph (test 4) with New Relic disabled, showing normal system call patterns and CPU usage distribution

Figure 3: System-wide flame graph (test 4) with New Relic enabled, demonstrating significantly increased fstatat64 system calls and higher CPU utilization across all cores

This pattern remained consistent across multiple test runs and was not affected by:

Disabling all New Relic features
Using the latest agent version
Different sampling frequencies (99Hz and 997Hz)

System Call Analysis

Through system-wide performance profiling, we identified a significant increase in fstatat64 system calls when the New Relic agent is enabled. This suggests excessive file operations being performed by the agent.

Testing Methodology

We conducted extensive profiling using:

PHP-FPM specific profiling at different sampling rates:
perf record -F [99|997] -p $(pgrep php-fpm -o) -a -g --call-graph fp -- sleep 60
System-wide profiling:
perf record -F [99|997] -a -g -- sleep 60
System call tracing:
timeout 60 strace -tt -f -C -p $(pgrep -o php-fpm)

Version Impact

This performance regression appears to have been introduced between versions 10.0.0.312 and 10.7.0.319. Earlier versions did not exhibit this behavior.

Supporting Evidence

All profiling results are attached to this issue in newrelic_profiling_results.zip, which includes:

PHP-FPM Specific Profiles

With New Relic disabled:
- 99Hz sampling (phpfpm_nr_off_99hz.*)
- 997Hz sampling (phpfpm_nr_off_997hz.*)
With New Relic enabled:
- 99Hz sampling (phpfpm_nr_on_99hz.*)
- 997Hz sampling (phpfpm_nr_on_997hz.*)

System-Wide Profiles

With New Relic disabled:
- Test 3 (system_nr_off_99hz_test3.*)
- Test 4 (system_nr_off_99hz_test4.*)
With New Relic enabled:
- Test 3 (system_nr_on_99hz_test3.*)
- Test 4 (system_nr_on_99hz_test4.*)

Questions

Is there a known reason for the increased frequency of fstatat64 calls?
Are there plans to optimize file operations in future releases?
Could this be related to the agent's file monitoring or instrumentation mechanisms?

newrelic_profiling_results.zip

The text was updated successfully, but these errors were encountered:

iekadou · 2025-02-27T09:51:57Z

We have a similar Issue, but for almost all versions, also previous to 10.0.0.312.

If we have 500+ req/s at the fpm we run into a CPU spike (all 96 cores have ~100% Kernel usage) which is directly resolved after restarting the newrelic-daemon.
Those spikes do not occur at all if newrelic is not running.

We also see a filename_lookups which result to locks.
Screenshot of flamegraph attached.

I think this could be related?

Sadly I have no idea how to detect which file(s) are causing those locks.
If someone knows how to detect on which paths are used by native_queued_spin_lock_slowpath feel free to help me :)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance Investigation: CPU Spikes & System Call Overhead in New Relic PHP Agent #1021

Performance Investigation: CPU Spikes & System Call Overhead in New Relic PHP Agent #1021

theophileds commented Feb 13, 2025

iekadou commented Feb 27, 2025

Performance Investigation: CPU Spikes & System Call Overhead in New Relic PHP Agent #1021

Performance Investigation: CPU Spikes & System Call Overhead in New Relic PHP Agent #1021

Comments

theophileds commented Feb 13, 2025

Context

Environment

Findings

CPU Usage Pattern

Flame Graph Comparison

System Call Analysis

Testing Methodology

Version Impact

Supporting Evidence

PHP-FPM Specific Profiles

System-Wide Profiles

Questions

iekadou commented Feb 27, 2025