Callgrind is a call-graph generating profiler that also collects information about processor cache hit rate and branch prediction. Callgrind is only useful if your bottleneck is CPU-bound. It's not useful if heavy I/O or multiple processes are involved.
Valgrind does not require kernel configuration but it does need debug symbols. It is available as a target package in both the Yocto Project and Buildroot (
You run Callgrind in Valgrind on the target, like so:
# valgrind --tool=callgrind <program>
This produces a file called
callgrind.out.<PID> which you can copy to the host and analyze with
The default is to capture data for all the threads together in a single file. If you add option