06 Oct Embedded Recipes 2023 Day 1 – part 2, Paris, France
Embedded Recipes 2023 Day 1 – part 2, Paris, France
by Charles-Antoine Couret, Olivier L’Heureux, and Arnout Vandecappelle
Best practices for OTA update with SWupdate – Stefano Babic, DENX
SWupdate is an update agent that is put on the device and takes care of updating. It supports several different media (disks, web download, web upload, …) and several handlers to write the update. It is integrated with yocto [Arnout: and with buildroot]. It is a framework: it doesn’t say exactly how you have to perform the update, it just gives you the tools to do it. There is build time configuration (which features are built in, using Kconfig) and runtime configuration
SWupdate.cfg (media sources, which handler(s) to use, hooks to apply). The update itself is in a single file, the SWU file, which contains the images and a description. Since SWupdate is so flexible, it’s easy to do things the wrong way. This talk gives some rules about how to build an update system with SWupdate.
Write a concept
Don’t just start implementing something, but write down what you want to achieve. Choose which features (e.g. delta updates, webserver) are actually relevant and only enable them if so. Write down how you will achieve those goals. Write down which media you need to support, which hardware needs to be updated (e.g. FPGA, daughterboards), and which components need to be updated (just userspace, kernel, bootloader?). Also set a security level: signing, based on either plain public key or with PKI certificates; encryption. Also decide which runtime measures you want to apply, e.g. privilege separation by running the media access as an unprivileged user. Many of these things have to be decided up front, otherwise there will be incompatibilities between the SWU file and the configuration on the (older) target.
Examine the risks
Find single points of failure (non-duplicated resources), which can lead to a bricked system. For instance if the bootloader is not duplicated. [Arnout: or even just the variable where the A/B flag is stored.]
Evaluate the risk/cost. For example, if you update a non-duplicated bootloader, there’s a very small chance it’s bricked. Is it so small that we can afford to send a few devices back to the factory when they’re bricked?
For the risky parts, make sure the risk is minimized. For example, don’t do streaming updates for them – only apply the update when the image was fully downloaded/copied and verified. This minimizes the time that the risk is there.
Use versioning to avoid updating if not really needed. You want to contain the bootloader in every SWU image, even if for that particular update it is not needed. SWupdate versioning can take care of not updating the bootloader if it in fact is already OK.
Make sure that updates can still happen even if half the application is broken. Often, performing updates depends on complex application logic to determine where to fetch it, when it has to be applied, etc. etc., and this logic is often spread over multiple application components. That means that if any of those components isn’t working, it is not possible to perform an update. Thus, the device is essentially bricked. A good way to reduce the risk is to reduce the dependencies of the updates on application components.
It must always be possible to recover from broken application code. For example, make sure there’s a Plan B, e.g. with a button, to trigger an update even if the application is not working correctly. I.e., make sure that there’s a mechanism where SWupdate can work without requiring anything else at all to still work.
Verify the build config
Disable everything that you don’t need: media, handlers, verification algorithms.
On the other hand, do keep
CONFIG_HW_COMPATIBILITY enabled. This makes sure that you don’t accidentally use an SWU for the wrong board. It has to be enabled from day one however otherwise it doesn’t work.
Don’t abuse shell scripts
Prefer to use the built-in features of SWupdate.
Shell scripts consume resources – they could cause an out-of-memory condition at the point of writing the image. Normally, SWupdate makes sure that all resources are allocated before starting to write. Shell scripts are also a security hole. Scripts are often a point of attack. They are run with installer privilege (i.e. usually root).
Scripts are part of the SWU, not of the installed system. Thus, if you have to rely on shell scripts as part of the update process, this means that the installed system can’t really update itself. That is not very reliable.
If you do need scripts, use Lua. It consumes less resources and allows better control.
Scripts also don’t really work well with streaming update, because they are only available somewhere in the middle of the SWU file. It is not defined whether they execute before or after the other images have been handled. Lua doesn’t have this problem because it can be put in the manifest itself.
installed-directly = true;
With A/B update, this is always safe. It avoids requiring temporary memory or disk space for the SWU file. It works even if the filesystem is corrupt, which makes it possible to recover.
At least enable SHA-256. cpio has a CRC, but that’s extremely weak. For signed images, a hash is mandatory anyway.
type attribute to check the subimage type.
Try to make sure that the subimages can be installed in any order.
Check the bootloader environment
The bootloader needs to know whether to boot A or B. So the bootloader variables (“environment”) need to be updated as well. Updating them runs the risk of corrupting the environment or leaving inconsistent values. This can be avoided with a redundant environment, i.e. an A/B update of the environment.
GRUB doesn’t have redundancy at all, so an update may corrupt the environment completely. EFIBootguard and Cboot have a redundant environment out of the box. U-Boot needs to enable
Plan a rescue
Even with A/B update, the second copy doesn’t really count as a redundant copy. After time it may get corrupted, e.g. because of a failed update. So if the main copy gets corrupted as well (e.g. through some kernel error or media error), the device is bricked. It’s therefore a good idea to have another fallback.
This fallback doesn’t necessarily need to be built into the device. E.g. it may be from USB storage or BOOTP. The mechanism can also be reused for production, e.g. load the rescue bootloader with the (slow) factory flashing method, then boot from USB storage and flash the entire device from there. The rescue mechanism anyway must be able to re-create everything on the device, including the parts that aren’t touched by the normal update. A typical rescue image is a very minimal kernel with builtin initramfs.
Do not trust the vendor
A vendor will often already provide an update system, perhaps based on SWupdate, with their “BSP”. This should really be taken only as an example. It doesn’t look at your specific goals. Security and reliability are not their primary goals – easy of development is. So go back to step one, write down the goals of your update system, and start from there.
Is there a way to distribute a single image to many devices (with slow external connection)? One option is to use the gateway feature of SWupdate. You can also write a post-install handler to distribute the image to other devices.
Is there a way to do delta updates with SWupdate? Yes, but in a different way than the typical one – it doesn’t require to keep different images on the server. Instead, the device itself calculates for each sub-image what the current hashes are for chunks. These hashes are compared with the zchunk index of the SWU that is downloaded from the server. SWupdate then identifies which chunks are needed, and downloads only those from the server.
[ Olivier’s personal thoughts:
- Interesting experience with SWupdate, the kind of message difficult to get through the documentation.
- SWupdate is worth studying further (never used), but it is not that trivial to use. ]
Silicon Root of Trust – Sameo, Rivosinc
When using cloud servers, you actually want to be sure that your application runs on something that is trusted. I.e. that it’s not running on some VM that can be read out or influenced by the cloud provider. Trusted Computing Group (TCG) gives a few definitions of Root of Trust (RoT). Matthew Garret made a good definition: “A thing that has to be trustworthy for anything else on your computer to be trustworthy”. [Arnout: the speaker said it a bit wrong, but the following sentence is essential.] To be sure that your application is actually running in a trusted environment, you need a chain of trust starting at the RoT going up to your application. The chain of trust validates all the SW components that run in between and that can jeopardize your trust.
On a PC with secure boot, the chain of trust starts with the firmware (which has the role of RoT), which loads and verifies the bootloader, which loads and verifies the kernel, which loads an verifies the application. The firmware itself comes from flash, so it can’t be a root of trust. Therefore, the RoT must be in the SoC silicon itself. On Intel, this is the Intel Management Engine.
If the idea is to prove to the application that the chain of trust is maintained, it’s not enough to load and verify. In addition, a proof has to be constructed. This proof can be provided to outside parties (e.g. a server with which the software communicates). Therefore, every component needs to provide a state variable, and each of them is signed by the RoT. In practice, the RoT and the signing of state variables is usually done in separate silicon. This has to do a number of things: it attests the loading of the firmware, it also updates the firmware, it does power-on reset (otherwise you’re not sure that nothing happened before on the CPU). It also needs to support a provisioning flow at manufacturing time. There also needs to be support for debugging.
The code of the RoT is generally closed source C and assembly. This is what the whole cloud runs on. The silicon is proprietary hardware design that nobody sees. Nobody knows what testing and validation coverage it receives. Can we even trust this RoT? There are plenty of CVEs recorded against it…
Therefore, there is interest in open source hardware RoT. Including hardware RTL, validation, documentation, and ROM code. And tooling to support it. Two such projects exist: Caliptra (ChipsAlliance) and OpenTitan (lowRISC). This talk is about OpenTitan – the speaker is a small contributor. OpenTitan is a continuation of Google Titan project but shares no code with the original. It provides a transparent, high quality reference design for silicon RoTs. The only thing that is not open is the physical design (gate-level, layout) and the fabrication process. It is real open source, i.e. working with PRs, reviews, CI, coding style, and open governance. CI does testing on FPGAs. New features always need to have a test and documentation, and also a software library to use it.
There is a heavy security focus. Any critical asset must be protected with countermeasures, e.g. shadow registers or multiple bit encoding. All peripherals can define security alerts to alert the CPU when a countermeasure is triggered. All bus transactions and memory is scrambled.
IP blocks don’t only need to talk to the CPU but also with each other. They have a configuration file that describes their interface, going from clocks and buses to alerts and countermeasures. IP blocks are comportable, one can be replaced with another.
Earlgrey is a discrete implementation, i.e. an RoT as a separate chip. The core is the Ibex Core, a RISCV32. It’s simple and slow, just 3 stages. Branch prediction is optional. There is no speculation, for obvious reasons [laughter from the audience]. There’s a lock-stepped backup core to make physical attacks much more difficult. It uses ePMP (Enhanced Physical Memory Protection) [Arnout: this is a kind of simplified MMU.]
Its boot process has three stages: ROM boot is in gates. ROM extension is loaded from flash and is verified – it’s pretty much the traditional bootloader. It makes sure that the RoT Firmware itself is loaded from flash. OpenTitan provides reference implementations of ROM and ROM extension, but not the RoT firmware itself. The RoT firmware can be e.g. Hubris or Zephyr.
Security is hard, but it’s easier with open source hardware. This can make the cloud a less scary place. OpenTitan welcomes contributions from anyone.
[ Olivier’s personal thoughts:
- I don’t understand which problem they try to solve!
- It is a typical mistake, IMHO. They focus on a problem they could handle almost perfectly, i.e. the root of trust, and they spend a considerable amount of time on a very small part of the security problem. At the same time, there are huge parts of the security problem that are difficult to solve and stay unsolved or even unexplored.
- To the owner of a hammer, all problems look like a nail.
- Somebody said it is for special machines, possibly in the cloud. It seems very specific and not really for the embedded world. ]
PipeWire as the heart of Linux-based audio system – Philip-Dylan Gleonec, Savoir-Faire Linux
Audio has become complex due to the arrival of digital interfaces: SAI, USB audio, SPDIF, … In addition there are advanced algorithms in the audio path. And of course this needs to be output in real time.
On the other hand, SoC advances have made audio processing cheaper, from peripherals to DSP Accelerators and ISA extensions. Linux has good support for this kind of hardware.
ALSA is the Linux userspace interface for audio. It allows exclusive access to a device to a single process. To allow multiple applications to multiplex or mix audio access, a sound server is needed in userspace. There used to be JACK and PulseAudio. JACK is real-time, low latency. PulseAudio is more flexible but has a high latency. PipeWire was created to meet the requirements of both.
Ideally, there should only be a single, synchronous interface in the system. Otherwise, you risk getting buffer underflows or overflows at the synchronisation point whenever multiple streams are mixed or routed.
PipeWire is not just a sound server, it’s a multimedia server. It routes video as well. It’s based on graph processing – since it was started by Wim Taymans of GStreamer fame. It integrates with both PulseAudio and JACK applications – it provides those interfaces. It also supports GStreamer applications. It has an excellent community. For example, it got from no Bluetooth support to the best available Bluetooth support in just a few months.
Philip-Dylan performed benchmarks to compare the performance of PipeWire with JACK and PulseAudio. Tests were performed on an i.MX8M with Yocto, kernel 5.4. Using the CamillaDSP framework, a karaoke application was tested with PulseAudio, JACK and PipeWire backends. It has complex routing and mixing of multiple input and output channels. PipeWire has mostly better latency than JACK, except at really small buffer sizes. It also generates less CPU load. For PulseAudio, latency was not compared because PA is not meant for low latency. Therefore, for comparison, the latency was fixed at 60ms, and the CPU load was measured. PipeWire with JACK API has lower CPU load than either of the others. With PA API, however, the CPU load is higher than PA. This is a point of future optimisation. Asynchronous or multiple separate synchronous interfaces are not benchmarked yet. It’s also difficult to do, e.g. JACK doesn’t really support that use case.
There was also an evaluation of the CPU overhead with asynchronous streams – but not comparing with the older systems. Instead, there was a comparison with hardware-supported sample rate conversion. First of all, with PipeWire it is possible to specify the interface clocks, so you can make sure the source and sink have the same clock. This removes the need for resampling (ASRC, Adaptive Sample Rate Conversion) entirely, reducing load and latency.
For digital interfaces, e.g. USB audio, it is not possible to let source and sink have the same clock. Thus, even if they have the same sample rate, ASRC is needed. With USB audio, which is isochronous, it still is possible to avoid resampling by adding a feedback endpoint. This feedback endpoint basically does flow control and thus avoids buffer under/overflows at that level. It does require kernel support in the driver, which was only recently added. It was also added to PipeWire recently, before that it was only supported by alsaloop. Benchmark results show that ASRC can indeed be removed entirely, at the cost of a bit higher CPU load from PipeWire for the feedback.
If ASRC is needed, it should be possible to offload it to a hardware accelerator. Using a hardware accelerator is more complicated however. There’s no real ALSA interface for it. There’s an i.MX6-specific micro-API, but only in the linux-imx fork, not in mainline. So it will be difficult to support that in PipeWire. Instead, Philip has worked on upstreaming parts of a generic solution, though it’s a bit cobbled together at the moment.
Sound Open Firmware (SOF) is a project to create firmware for DSP accelerators. This is much more promising for a generic accelerator interface. It’s still a very young project though.
In conclusion, PipeWire is now ready for production in embedded systems. It can avoid resampling and thus free CPU resources. More work is needed for integrating DSP accelerators.
In response to a question: even if there is only a single application using the audio devices, it is still useful to use a sound server. This is a standard interface that gives you flexibility in your software development. For example, if it turns out that you later need to insert some filtering, resampling or other processing, it’s very easy to do that with a sound server, while it would require a bunch of custom C code development if you use ALSA directly.
[ Olivier’s personal thoughts: PipeWire developed itself quickly as the modern alternative to PulseAudio, which looks now obsolete. JACK may perhaps stay better for professional use, but PipeWire competes with it now. ]