Building Trust - Use Cases and Implementation of TPM 2.0 in Embedded Linux Systems

Anna-Lena Marx, Inovex

Slides, Video

Many people have heard about TPM, but do not know what it really is. This blog post gives an introduction to help you get started with it.

The original Trusted Platform Module (TPM) was developed by TCG. It was a physical hardware chip which implemented a standard for secure storage. Nowadays a TPM can also be implemented in software in a Trusted Environment. The original standard from TCG has since been standardized as ISO/IEC 11889-1:2015 .

TPM 2.0 provides 3 types of features:

  • Cryptographic operations, e.g. RNG, key generator, hash and HMAC functions, crypto algorithms.
  • (Secure) permanent memory for various purposes.
  • Volatile memory.

TCG also specifies the APIs of a TPM library and the software stack. There are several implementations, including an official reference implementation. The most well known is tpm2-tss but there is also WolfTPM for microcontrollers.

TPM is usually used with asymmetric crypto, also known as public-key cryptography. The primary keys can be generated on the TPM or imported. They never leave the TPM. The public keys are also generated by the TPM, and they are encrypted with the primary key. These public keys can be exported.

Each TPM chip contains a list of seeds. These never leave the TPM and persist through reboots. Using these, the TPM can deterministically generate new keys. TPM defines 4 key hierarchies. Each hierarchy uses one seed that can be used to prove ownership of that hierarchy.

  • The Endorsement hierarchy is defined by the manufacturer and is used to validate the authenticity of the TPM.
  • Platform hierarchy is controlled by the platform manufacturer / OEM who ships the UEFI (or other early boot) firmware.
  • Owner hierarchy is user-controlled and where you add your own keys.
  • NULL hierarchy is not persistent, it uses a new seed at every boot, and is used for key generation etc.

Platform Configuration Registers (PCRs) are registers containing a hash digest. There are at least 24 but most TPMs have more. Mostly they can only be reset at power on. They are tied to CPU run levels and new values can only be extended when the CPU is running in that particular ring. Extending a PCR is basically replacing the value with the hash of the previous value concatenated with the extension value.

TPM defines the purpose of many of the PCRs, other PCRs are claimed by grub, systemd.

You can look at PCRs with systemd-analyze pcrs or with tpm2_pcrread.

How can you trust a TPM?

The Endorsement hierarchy allows you to check that the TPM is in fact belonging to a specific manufacturer and it has a specific serial number. To establish trust, you first generate an endorsement key-pair, get its public key and get a certificate. Then later you can ask to check that public key and certificate with a random nonce.

Yocto has a meta-secure-core layer that implements TPM-based secure boot for Intel-based UEFI platforms. It uses grub bootloader and mender as update system. It is not trivial because the TPM secure boot documentation pretty much assumes initramfs which we do not want (and mender does not really support). Also, mender does not support to have the kernel in a separate boot EFI partition, so you need to add a script to handle that.

To have an encrypted data partition, add a systemd service that runs before data.mount that creates the partition with cryptsetup if it does not exist yet, and then uses systemd-cryptenroll to bind it to the TPM.

SteamOS and its impact on the Linux Ecosystem

Olivier Tilloy, Igalia

Slides, Video

When Olivier Tilloy was a teenager he spent his days playing Halflife. Back then it was only running on Windows, which Olivier did not consider a problem at the time.

In 2013 Valve released the Debian based SteamOS 1.0. At the time of writing they are about to release version 3.7. The project tries to give at least the same level of support on the Linux-based SteamOS as you would get on Windows. However, SteamOS is a general purpose OS. It has two parallel sessions: a gaming-specific desktop and a KDE Plasma session. Right now there is a Steam Deck, Steam Deck OLED, and even the Lenovo Legion Go S running SteamOS.

SteamOS is a huge effort involving many teams and consultants. In the kernel, there have been changes to the CPU scheduler, Latency-criticality Aware Virtual Deadline (LAVD) scheduler was developed using sched_ext. It measures latency-sensitivity of tasks and takes this into account for scheduling. Many games are running on Proton/Wine, so you cannot expect the process itself to give the latency information. LAVD favours latency criticality over fairness. A demo shows the impact.

The DRM scheduler is being changed to improve fairness. It is difficult because on the GPU you do not have very fine scheduling precision. Also they are adding a cgroup controller for DRM scheduling, so you can change DRM priorities of tasks (like you can do with nice for the CPU scheduler). This can be used to prioritize focused windows over background applications.

In DRM/KMS, there is ongoing work to offload HDR color transformations to hardware blocks. In the AMDPU driver this is exposed with driver-specific properties.

Valve are also trying to retrofit disk encryption into SteamOS. Since it must be possible to add encryption to an existing installation, they use fscrypt and wrote a new CLI tool called dirlock which uses the D-BUS API to manage it.

XDG portals are a way to force an application to ask permission from the user. They are implemented by the compositor. There are implementations for KDE and Gnome, but they are not suitable for gaming because you do not want to interrupt the gaming flow. So a specific portal implementation was added to the compositor.

Valve has an upstream-first approach so all of the work has a direct impact on the open source software community.

Perfetto profiling & tracing for Upstream Kernel Development

Zimuzo Ezeozue, Google

Slides, Video

Zimuzo Ezeozue works at Google helping other engineers to debug performance issues on Android.

Perfetto is a set of Linux binaries that write data formats that get collected in an SQLite backend that is displayed on a web UI. The web UI is served from a Google server, but the data stays local on your machine and you can also build and serve it locally. It queries the table in your local SQL backend.

Perfetto supports a lot of data formats: Perfetto native, Android systrace, ftrace text, JSON, and perf. So you can either use the Perfetto toolbox to generate traces in the native format, or you can use a different tool and import from one of the other formats. The Perfetto toolbox has a top-level binary called tracebox which calls a few other tools to collect events from tracefs, procfs, sysfs, etc. The data sources are configured in a text file. You can build the toolbox from source. Since it supports the Android systrace format, the tool can be used to visualize a systrace in the UI (using begin and end markers with corresponding ID).

The config file defines the buffer size to use and the configuration of the sources you want to trace. The UI uses the events and correlates them with other information e.g. from procfs (assuming you configured that in the trace config) to give all the details of what happens. It is mostly viewed on a timeline where you can zoom in and select tasks, but there are other views as well. It has a command line box to do queries and other operations. It is especially powerful to visualize function graph traces, because the ftrace text output is bit hard to parse. For instance it can generate a flamegraph of time spent in the traced functions. You can also make SQL queries directly, and then with “Show debug track” use that query to visualize the result on the timeline. You can inject events directly from your application using the kernel trace buffer.

libcamera: Past Present Future

Kieran Bingham, Ideas on Board

Slides, Video

SoCs nowadays have a complex Image Signal Processor (ISP) that can be used to set up a processing pipeline for images captured with a sensor. The purpose of libcamera is to make abstractions of how to set up such a pipeline on many platforms.

For an application using a camera, you have the following options in Linux:

  • talk directly to the V4L2 device.
  • use GStreamer to talk to it.
  • use libcamera, optionally with GStreamer or the Android HAL in between.
  • Use pipewire either directly accessing the camera or with pipewire under it.

libcamera is mostly used with one of the integration libraries: GStreamer, Android HAL, pipewire, or Python bindings. Thanks to pipewire integration, it can also easily be used through XDG-Portal in browsers and other applications. libcamera is now at 0.5.0 which again introduces ABI changes. The goal is to reduce the number of ABI changes but they are not quite there yet. Since there are vendors that do not want to open the specs of their ISPs, there is also a software implementation of all blocks, a “Software ISP”. There is a lot of work ongoing with debug and development tools, and implementing new algorithms. For a 1.0 release, the most important step is to stabilize the ABI. An important step there is to go to a C ABI instead of a C++ ABI. It also needs reprocessing (loops in the pipeline).

Writing Linux Real-Time Applications

John Ogness, Linutronix

Slides, Video

In September 2024, PREEMPT_RT became fully mainline in 6.12. This does not mean that all applications are suddenly real time. There are a lot of things to be aware of when writing a real-time Linux application:

Memory has to be in physical RAM

If some of the application code or data is not in physical RAM, you get a page fault which can take unbounded time. Linux maps memory on demand, so simply allocating it is not sufficient. You have to touch every page to make sure it is in physical RAM and allocated in the MMU. Also, memory gets recycled: it can swap or evict text pages. Therefore, you need to use mlockall and pre-fault all memory. You also have to use mallopt to make sure glibc does not release unused heap back to the kernel. Another thing is that dlopen is normally lazy, so you have to load it with RTLD_NOW to make sure it is locked in memory.

Set priorities

PREEMPT_RT just offers the possibility to run real-time threads with full preemption. You still need to set a thread’s policy to either SCHED_FIFO or SCHED_RR to make it a real-time thread. Also you have to give each thread an appropriate priority. chrt is used to change policy and priority of existing threads, sched_setscheduler to do it from the application itself.

Synchronization and notification

You need locks to synchronize between threads. Only pthread_mutex_t supports priority inheritance, which you need to make sure that you avoid priority inversion problems. You have to turn on PTHREAD_PRIO_INHERIT explicitly because it is not the default. You can also use pthread_cond_t to wake up on an event and immediately take a lock to be able to look at the incoming event. librtpi wraps these two primitives to do data ownership transfer (the typical usage pattern) more efficiently.

Cyclic tasks

Some APIs, such as timerfd, are not RT safe. Also POSIX timers (based on signals) do not work well in real-time. Ideally it would be possible to fix timerfd so that it does work in real-time. The correct way to do cyclic tasks is to use a dedicated thread that uses clock_nanosleep to sleep after it is done. Use CLOCK_MONOTONIC to make sure you do not get distracted by clock updates, leap seconds ets. Also use TIMER_ABSTIME to make it cyclic, do not try to dynamically calculate the amount of time to sleep.

CPU affinities

Even though tasks have real-time priorities, there is still interference between them due to shared resources (caches etc.). You can isolate tasks on a specific CPU using cpuset (it uses cgroups to organize things). The isolcpus kernel argument is deprecated. You can do the same with interrupts, i.e. make sure that a specific interrupt is always handled by a specific CPU, using /proc/ireq/IRQ_NUMBER/cpu_affinity.

Networking

Ideally you have hardware that supports real-time networking (e.g. PTP), traffic classes from 802.1Qav and 802.1Qbv, timestamps for Rx and Tx, multiple queues with priorities. You also need an isolated CPU for RT networking, which has the interrupt kthread, the NAPI instance kthread and the RT application. There are patches currently under review that should be able to lift the requirement of isolating CPU. RTC-Testbench is a tool to validate the realtime behavior of a networking setup. You should test your hardware before you even start writing your application, because if it does not meet the latency requirements with that test bench, your application is never going to do better.

Design considerations

You have to take into account (priority-based) real-time considerations already during the system software design. It should be designed around events that flow from high priority to low priority. I.e. an event starts at high priority, you check it and then decide to requeue it at lower priority or to handle it right away. Use the realtime priorities only for tasks that really have real-time requirements. Do not try to use realtime priorities to tune the scheduler, use SCHED_OTHER priorities instead. When a real-time task goes to sleep, always consider who is going to wake the task up, and check the latency of that waker.

You will make mistakes

There are a lot of interfaces in Linux, and not all of them are RT-safe. Also libraries may use such APIs without you being aware. Therefore, you need something to test if your real-time properties are met.

There is now a tool in Kernel Hacking that monitors real-time applications. If you turn it on in the tracing subsystem, you can use perf to record instances of doing non-RT sleeps or page faults from an RT thread. This was used for instance to make sure that the threads in pipewire that should run real-time do not have these problems. Many issues were already found and not all of them are fixed, but thanks to the validation it is getting better. Unfortunately it can also have false positives, i.e. situations where it is OK to use the non-RT-safe APIs. So anything found needs to be carefully analyzed. Internally, it uses a logic language to validate the allowed states of real-time threads. If corner cases are encountered that are false positives in the generic case, the logic can be adapted to cover that.

Power management

Another issue is CPU frequency scaling which may interfere in unexpected ways. You basically have to turn it off on the CPUs used for real-time. Also, power management on peripherals may interfere. Currently, power management is simply not RT-safe.

CNC and 3D printing: open source all the way!

Jean Pihet, New Old Bits

Slides, Video

This CNC and 3D printing work is a side project for Jean Pihet, whose regular job involves developing and integrating low-level software for customers. His goal is to use open-source software, both on the host to create the model, and on the CNC or 3D printer to control the process.

Today, the open-source world includes everything for controlling and creating 3D prints: electronic parts, physical parts designs, firmware, tools to control the printer and tools to create (slices from) a 3D model. You create a model in a CAD application. You can start from a 2D picture and import that in the CAD tool. Then you slice this into layers, which are expressed in G-code. The G-code says where to move the print head and when to start extruding, at which temperature, etc. The G-code is specific for the target printer. It can either be put on an SD card which you insert into the machine, and control the printing process with a local UI on a small LCD and buttons. Or you can connect the printer with USB and use OctoPart to control the printing process from your PC.

The most commonly used CAD tool is Blender, which is not actually a CAD tool. OpenSCAD is an actual CAD tool. It is script-based and parametric so you can easily repeat the same thing multiple times. FreeCAD is a one-stop shop for all kinds of things, supporting CAM and CNC as well as 3D printing. You can do anything from sketching to assembly of parts in it.

In terms of 3D printing software, there are two major software packs: PrusaSlicer and Ultimate Cura. They have a lot of options to define the printer and the filament. There is also an easy export to share the result e.g. on GitHub. You also have experimental features like printing a very long part by moving it on a rubber band.

OctoPrint is the go-to tool to control a printer from your PC. It works with Marlin as the firmware. It also has plug-ins to do extra stuff. There are also Mainsail and Fluidd to control printers that have Klipper as firmware. There are two major firmware families. Marlin is the oldest one. It performs all the tasks on a single Arduino program, including the UI. Klipper is a newer project that splits the UI from the RT tasks. The UI runs on a general-purpose system, e.g. a RaspberryPi, and the motor control on a microcontroller. The latter is called Klipper, while the UI part can be either Mainsail or Fluidd. They use an RPC mechanism to communicate with each other.

Jean bought a Velleman K8400 and then modified quite a lot on it. For example, he added a heated bed and additional fans, in order to improve robustness, print speed, print size, different filament materials, and better control. This could be patched into Marlin with just 40 lines of configuration changes.

CNC is a different beast entirely. There is not as much open source in there. It is mostly industrial and software costs a lot. However, there are also similarities. The motor control does pretty much the same: move a head in x,y,z. The CAD tool is also pretty much the same. However, the slicer is different, because you are subtracting instead of adding. Ideally, the CAD program would support it directly, then it can help making sure that you do not design impossible things. There are a few such tools: PyCAM is very basic and no longer maintained, but it works. The Linux CNC distribution is functional but hard to use. Jean has not tried CNC Toolkit. FreeCAD is the one-stop shop and supports pretty much everything you need for CNC. It can do profiling (cut out the outside shape), drilling, and engraving.

Jean got a machine for free from a friend, a Kosy A3 from 1992. It is really robust hardware. However, its mainboard and firmware were really outdated using non-standard serial protocol. Jean also could not find (free) host tools for it. So Jean replaced the mainboard with something he took from the 3D printing world: the Rambo board with Marlin firmware and a USB connection. He added an SD card, buttons and an LCD. He only needed to modify the Marlin config to get the right firmware for this board and CNC.

Now you need to get the G-code from FreeCAD. FreeCAD can inject preamble and epilogue G-code, which is used to put the head in the home location and start the spindle. It also turned out that the generated G-code was not compatible with Marlin, so he wrote a post-processing script doing things like fixing up line endings. Jean used it to make a PCB: you can import a KiCAD design into FreeCAD and engrave the traces into a bare board, then you can fill those traces with metal and solder your chips on it. Jean gave a nice demo of how you use FreeCAD in practice, starting from a 2D sketch, through the 3D model to the code for the CNC. He also showed how you can design something consisting of parts that are 3D printed, and other parts that are cut with the CNC.

Status of Embedded Linux

Tim Bird, Sony

Slides, Video

Considering the pace of the presentation, I suggest watching the video if you are interested in the topic. Here are some key takeaways. Almost everything he mentioned also has an article on LWN, the links to which are visible in his slides.

Tim gave an overview of what is new in kernels 6.10-6.15. 6.12 was selected as an SLTS version (10 year) by CIP. He is also organizing a Boot Time Special Interest Group to discuss what can be done to improve boot time. Kernel gained Capability Analysis: annotations in the code that can be verified by clang to make sure that properties are held. See LWN article. A lot of GPU support is getting upstream.

Tim has been involved in Linux since the 90s, and things have really changed since then. His main worries at the moment are AI, maintainers ageing, things like boot time optimization taking too much effort.

Yocto’s hidden gem: OTA and seamless updates with systemd-sysupdate

Martín Abente Lahaye, Igalia

Slides, Video

systemd-sysupdate works with two files: the kernel UKI and the rootfs. You can create these with Yocto, and you have to specify that the UKI has to be installed in the EFI boot partition. You also need to make sure that /etc/os_release contains the version that is currently running. Some additional meson options have to be enabled in the systemd recipe. You have to create the transfer units that define how to install the UKI and the rootfs image. systemd-update can fetch updates from a simple server. It takes the highest version number that it finds in the directory and installs the files according to the transfer units.

There are a few limitations:

  • it does not support delta updates, but the concept is under discussion;
  • it is not properly integrated in Yocto, all of the above has to be done by hand;
  • it only works on UEFI.

Zephyr and cryptography

Valerio Setti, BayLibre

Slides, Video

There are several microcontroller OSes, Zephyr is a very successful one under the Linux Foundation umbrella. It covers almost all possible 32-bit and 64-bit microcontroller CPU architectures. It has a fully-featured kernel and many (optional) subsystems for various types of peripherals, protocols and devices. The system is built using configuration in Kconfig, hardware description in Device Tree Source, and applications written in C, C++ or Rust. The build system uses CMake and Python tooling.

Security is essential in IoT devices, and crypto is mandatory for many protocols. There is also TLS, and JWT to authenticate to a server. You also need cryptographically secure random number generators. Zephyr includes the MCUBoot bootloader which can verify signatures of the firmware it boots, which is essential with OTA updates. It has support for secure storage.

All of this needs a crypto provider. This used to be specific, but is transitioning to using PSA (Platform Security Architecture) abstraction. mbedTLS is a software based solution that supports the PSA API and has all required algorithms. However, it does not support hardware acceleration except for some optimizations using specialized instructions, and it does not support secure key storage. TrustedFirmware-M (TF-M) is the architecture of TrustZone on Cortex-M devices. There are PSA calls into the secure environment which performs the crypto operation and sends the results back to the non-secure (Zephyr) side. TF-M internally uses mbedTLS as well, but it is patched to support some hardware accelerators. Support for hardware acceleration is under development in mbedTLS. It will also remove the old crypto API and focus fully on PSA.

Yocto Project and OpenEmbedded: Recent Changes and Future Directions

Antonin Godard, Bootlin

Slides, Video

Antonin Godard is the documentation maintainer of Yocto. This talk is about what is new in Scarthgap (LTS), Styhead and Walascar.

  • USERADD_DEPENDS is a way to express that one recipe depends on a user or group created in another recipe.
  • devtool ide-sdk creates an SDK and remote debugging environment for a specific recipe. It also creates an IDE config file for the remote debugging session.
  • It is no longer possible to set S = "${WORKDIR}".
  • Configuration fragments are used to set up the local.conf in bitbake-setup.
  • include_all is a way to include a file with a specific name from all configured BBLAYERS.
  • FIRMWARE_COMPRESSION compresses all firmware binaries. This requires the kernel to be configured for loading compressed firmware. It can save a lot of space for images that contain all firmware blobs.
  • The debug-tweaks image feature is removed, instead explicitly set the following features: allow-empty-password, allow-root-login, empty-root-password, (one more I did not catch).
  • The Linux LTS kernel now has a lifespan of only 2 years, while the Yocto LTS has 4 years. Therefore, the default kernel will be updated in the middle of the cycle.
  • The following SPDX3 features are now supported:

    1. associating packages with CVEs;
    2. describe the build process;
  • cve-check relies on NVD but NVD is not really reliable any more. The bottom line is that CVE reports can be missing. The kernel adds CPE IDs itself, but that is not true for all projects.
  • clang as a system compiler is now directly available from core without extra layer.

There are a few more smaller changes.

I3C - the better I2C?

Wolfram Sang, Renesas

Slides, Video

Improved Inter-Integrated Circuit (I3C) is a communication standard developed by the MIPI Sensor Working Group. It combines having two wires like I2C, with a higher data rate like SPI. This is achieved via in-band interrupts which do not require an additional pin like I2C. It is also fast at 33.3 Mbps with a 12.5MHz clock. It is also hot-pluggable. There are already 6 controllers upstream. However, there are only 4 drivers.

It is not hard to convert an I2C driver to I3C. You can use register map so the device side is really easy. So what are the problems? There just are not many chips that support I3C, and even fewer affordable boards that have a controller. But Wolfram found an NXP board with a temperature sensor for $16.5. He soldered a second temperature sensor on the same board. He can also access it over I2C for debugging. Both sensors have upstream I3C drivers.

To debug I3C, he needs a high speed analyzer. Those quickly cost €1500-€2000. However, there is a I3C plug-in for the Saleae Logic2 software (source available though, no license). It is also difficult to find a controller, only newer SoCs have it and the boards are expensive. There are MCUs with I3C but only target side. The same Xyphro as the Logic2 plug-in also made a RPi Pico2 based board to do USB-to-I3C, that is very affordable.

I3C will be used, it is making its way into standards e.g. the DDR5 sideband is based on I3C. But nobody seems interested in Linux drivers, especially because it is not accessible to hobbyists.