You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Following up on issue #806, we conducted an isolated investigation to better understand the CPU spikes observed when the New Relic PHP agent is enabled. Our testing was performed in a controlled environment with a single container on a dedicated Kubernetes node.
Environment
Kubernetes 1.30 (EKS)
Instance type: m7a
PHP-FPM 8.2
New Relic PHP Agent: Latest version with all features disabled
Findings
CPU Usage Pattern
*Figure 1: Grafana CPU metrics showing distinct usage patterns:
Baseline period with normal activity
Spike to ~100% CPU with New Relic disabled (16:10)
Spike to ~300% CPU with New Relic enabled (16:15)*
Flame Graph Comparison
Figure 2: System-wide flame graph (test 4) with New Relic disabled, showing normal system call patterns and CPU usage distribution
Figure 3: System-wide flame graph (test 4) with New Relic enabled, demonstrating significantly increased fstatat64 system calls and higher CPU utilization across all cores
This pattern remained consistent across multiple test runs and was not affected by:
Disabling all New Relic features
Using the latest agent version
Different sampling frequencies (99Hz and 997Hz)
System Call Analysis
Through system-wide performance profiling, we identified a significant increase in fstatat64 system calls when the New Relic agent is enabled. This suggests excessive file operations being performed by the agent.
Testing Methodology
We conducted extensive profiling using:
PHP-FPM specific profiling at different sampling rates: perf record -F [99|997] -p $(pgrep php-fpm -o) -a -g --call-graph fp -- sleep 60
System-wide profiling: perf record -F [99|997] -a -g -- sleep 60
This performance regression appears to have been introduced between versions 10.0.0.312 and 10.7.0.319. Earlier versions did not exhibit this behavior.
Supporting Evidence
All profiling results are attached to this issue in newrelic_profiling_results.zip, which includes:
PHP-FPM Specific Profiles
With New Relic disabled:
99Hz sampling (phpfpm_nr_off_99hz.*)
997Hz sampling (phpfpm_nr_off_997hz.*)
With New Relic enabled:
99Hz sampling (phpfpm_nr_on_99hz.*)
997Hz sampling (phpfpm_nr_on_997hz.*)
System-Wide Profiles
With New Relic disabled:
Test 3 (system_nr_off_99hz_test3.*)
Test 4 (system_nr_off_99hz_test4.*)
With New Relic enabled:
Test 3 (system_nr_on_99hz_test3.*)
Test 4 (system_nr_on_99hz_test4.*)
Questions
Is there a known reason for the increased frequency of fstatat64 calls?
Are there plans to optimize file operations in future releases?
Could this be related to the agent's file monitoring or instrumentation mechanisms?
We have a similar Issue, but for almost all versions, also previous to 10.0.0.312.
If we have 500+ req/s at the fpm we run into a CPU spike (all 96 cores have ~100% Kernel usage) which is directly resolved after restarting the newrelic-daemon.
Those spikes do not occur at all if newrelic is not running.
We also see a filename_lookups which result to locks.
Screenshot of flamegraph attached.
I think this could be related?
Sadly I have no idea how to detect which file(s) are causing those locks.
If someone knows how to detect on which paths are used by native_queued_spin_lock_slowpath feel free to help me :)
Context
Following up on issue #806, we conducted an isolated investigation to better understand the CPU spikes observed when the New Relic PHP agent is enabled. Our testing was performed in a controlled environment with a single container on a dedicated Kubernetes node.
Environment
Findings
CPU Usage Pattern
*Figure 1: Grafana CPU metrics showing distinct usage patterns:
Flame Graph Comparison
Figure 2: System-wide flame graph (test 4) with New Relic disabled, showing normal system call patterns and CPU usage distribution
Figure 3: System-wide flame graph (test 4) with New Relic enabled, demonstrating significantly increased
fstatat64
system calls and higher CPU utilization across all coresThis pattern remained consistent across multiple test runs and was not affected by:
System Call Analysis
Through system-wide performance profiling, we identified a significant increase in
fstatat64
system calls when the New Relic agent is enabled. This suggests excessive file operations being performed by the agent.Testing Methodology
We conducted extensive profiling using:
PHP-FPM specific profiling at different sampling rates:
perf record -F [99|997] -p $(pgrep php-fpm -o) -a -g --call-graph fp -- sleep 60
System-wide profiling:
perf record -F [99|997] -a -g -- sleep 60
System call tracing:
timeout 60 strace -tt -f -C -p $(pgrep -o php-fpm)
Version Impact
This performance regression appears to have been introduced between versions 10.0.0.312 and 10.7.0.319. Earlier versions did not exhibit this behavior.
Supporting Evidence
All profiling results are attached to this issue in
newrelic_profiling_results.zip
, which includes:PHP-FPM Specific Profiles
phpfpm_nr_off_99hz.*
)phpfpm_nr_off_997hz.*
)phpfpm_nr_on_99hz.*
)phpfpm_nr_on_997hz.*
)System-Wide Profiles
system_nr_off_99hz_test3.*
)system_nr_off_99hz_test4.*
)system_nr_on_99hz_test3.*
)system_nr_on_99hz_test4.*
)Questions
newrelic_profiling_results.zip
The text was updated successfully, but these errors were encountered: