Extending the Kernel with Built-in Kernel Headers
Note: this article is a followup to Zack Brown's "Android Low Memory Killer—In or Out?"
Linux kernel headers are the unstable, constantly-changing, internal API of
the kernel. This includes internal kernel structures (for example,
task_struct
) as well as helper macros and functions. Unlike the UAPI headers
used to build userspace programs that are stable and backward-compatible,
the internal kernel headers can change at any time and any release. While
this allows the kernel unlimited flexibility to evolve and change, it
presents some difficulties for code that needs to be loaded into the kernel
at runtime and executed in kernel context.
Kernel modules are a prime example of such code code. They execute in kernel context and depend on this same unstable API that can change at any time. A module has to be built for the kernel it is running on and may not load on another because of an internal API change could break it. Another example is eBPF tracing programs. These programs are dynamically compiled from C to eBPF, loaded into the kernel and execute in kernel space in an in-kernel BPF virtual machine. Since these programs trace the kernel, they need to use the in kernel API at times, and they have the same challenges as kernel modules as far as internal API changes go. They may need to understand what data structures in the kernel look like or call kernel helper functions.
Kernel headers are usually unavailable on the target where these BPF tracing programs need to be dynamically compiled and run. That is certainly the case with Android, which runs on billions of devices. It is not practical to ship custom kernel headers for every device. My solution to the problem is to embed the kernel headers within the kernel image itself and make it available through the sysfs virtual filesystem (usually mounted at /sys) as a compressed archive file (/sys/kernel/kheaders.tar.xz). This archive can be uncompressed as needed to a temporary directory. This simple change guarantees that the headers are always shipped with the running kernel.
Several kernel developers disagreed with the solution; however, kernel maintainer Greg Kroah-Hartman was supportive of the solution as were many others. Greg argued that the solution is simple and just works as did other kernel developers. Linus pulled the patches in v5.2 of the kernel release.
To enable the embedded kernel headers, build your kernel with
CONFIG_KHEADERS=y
kernel option, or =m
if you want to save some memory.
The rest of this article looks at challenges with kernel headers, solutions and the limitations.
Challenges with Kernel Headers
Filesystem or Archive?
One of the challenges was to address concerns about unwanted memory usage of
the headers especially when they were not needed. First of all, we
compressed the kernel headers using LZMA to bring down its size from around
30MB to 3MB. This, however, was not enough, and even as little as 3MB was a big
concern among everything else. For this reason, I made the kernel headers as a
kernel module that could be loaded and unloaded on demand. This is precisely
what we do in the eBPF tools. When an eBPF program is compiled, BCC tools
load the headers module, compile the BPF program and unload the module.
Some kernel developers also wanted the headers to be mountable as a
filesystem and unmounted when not needed instead of a tarball archive.
Although
this improved the user interface, it offered no benefit in terms of saving
memory, so I ultimately thought it not was necessary and an unwanted
complexity. The other issue is that it would probably have resulted in more
memory being consumed, because such a solution, if based on squashfs
as
proposed, would not be able to meet the high compression ratios that we
achieved with LZMA.
Building Kernel Modules
One of the secondary goals of my kheaders patch was to be able to build kernel modules dynamically using the in-kernel headers. For this purpose, I initially archived not only the .h files but also some extra files that were needed for the kernel build process to be able to build a module. It turns out that kernel modules are extremely fussy. They have to be built with same C compiler that the kernel loading the module was built with; otherwise, they are not guaranteed to work. Also, some of those extra files I archived were actually binary executables needed during the kernel module build process. Since these binaries are built when the header archive is built, the kernel module build process also has to be run on the same architecture as that of the kernel loading the module. For example, if I built an arm64 kernel on an x86 machine, then the kernel module to be loaded can be built only on another x86 machine. They cannot be easily be built on the arm64 machine that is running the arm64 kernel, thus making the availability of on-device headers a bit pointless. I demonstrated that this limit could be overcome by using a chroot; however, such a solution is messy and easy to get wrong. In the end, I dropped kernel module build support completely and just focused on building the archive for the eBPF program use case, which was the primary goal.
Build Time and Incremental Builds
Masahiro Yamada, one of the main kernel build maintainers pointed out that my header archive took too much time during kernel rebuilds. Generally, the kernel build process is expected to be incremental. It should take a long time only during the first build, not every build. Using several tricks, such as checksums and checking for file modification time changes, I was able to bring down the rebuild time considerably. Linus Torvalds also pointed out some issues regarding build time, which I also fixed.
Conclusion
eBPF tracing programs are powerful but need kernel headers when they have to run on different kernel versions. Sometimes this is not an issue if the program does not depend on the internal kernel API. However, tracing programs, such as those in the BCC toolset, often do. By building an archive that contains these headers and building it efficiently, we have finally solved this issue. It is a testament to how a simple change can be upstreamed to the Linux kernel, its strength being its simplicity.