WIP: Remove dependency on kallsyms with eBPF #5217
Open
+122
−37
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I've been trying to improve perf tools' startup time to make working on fixing broken tests/features of perf bearable on NixOs. I've learned that processing /proc/kallsyms is a costly operation; on my ryzen5 system around 100ms just to read through it.
Aside from decreasing how many times kallsyms are read, I started to look into how to remove dependency on kallsyms.
From kernel source code kernel/kallsyms.c and printk() documentation https://docs.kernel.org/core-api/printk-formats.html#symbols-function-pointers I learned about special pointer formatting flags in kernel.
Writing a custom kernel module just for the purpose of extracting symbol names didn't feel right to me, so I've tried to use eBPF for that purpose. eBPF programs have these helpers available that are promising:
I've looked into how these two are implemented and found that both of them use bstr_printf underneath. But before any printing is done, the format string must first go through bpf_bprintf_prepare, which disallows certain flags.
Fortunately for us, thanks to Florent Revest
https://lore.kernel.org/bpf/[email protected]/ %pB is accepted. We might consider adding that info to bpf-helpers man page?
Overview of the new approach:
In order to reliably trigger the converter program, I decided to use USDT.
Running
time ./profile -F 2344 1
on a mostly idle system I got Before:real 0m1,215s
user 0m0,058s
sys 0m0,157s
After:
real 0m1,045s
user 0m0,009s
sys 0m0,026s
I ask the community here for your opinion, help and guidance to make this mergeable.
Using %pB slightly changes the format of a symbol name. Example: kmem_cache_alloc_noprof+0x2cf/0x300
It would be trivial to remove the suffix if it's necessary. Generating flamegraphs with Brendan Gregg's perl script still works.
For now, max_entries of the hashmap is hardcoded. Would you make it dynamic, like stack-storage-size or compute it by collecting all ips into a set and take this set's size as value for max_entries?
In Makefile:
When adding bpf/usdt.bpf.h clang errored it couldn't find asm/errno.h so I added -v to cflags when V is set.
As a quick fix, I hardcoded include path to x86_64 host header files. In contrast to clang, gcc corretly has this in it's default include path. Do you have any idea how to set this path, preferably so it doesn't only work on ubuntu?
Experimentally I added BPF_F_FAST_STACK_CMP | BPF_F_REUSE_STACKID to bpf_get_stackid. It doesn't seem to break the program and should make recording faster. If somebody knows why adding them is a bad idea please do tell.
If this gets a positive reaction, I will look into converting other tools here to use eBPF instead of /proc/kallsyms when possible.
Best,
Krzysztof