Embedded Recipes 2023 Day 1 – part 1, Paris, France

Embedded Recipes 2023 Day 1 – part 1, Paris, France

by Charles-Antoine Couret, Olivier L’Heureux, and Arnout Vandecappelle

This is a small and cosy conference: about 100 hackers, expert speakers in a single track, in a cosy place where there’s room to socialize, eat and drink together.

One of the sponsors, DH Electronics, gave away an Avenger96 board to every attendee. The Avenger96 is based on the DHcore SoM which has an STM32MP15 SoC and 1GB DDR.

Videos of each individual talk will be made available later. For now, you can search for the talk in the recording of the entire day – check the embedded recipes website.

Running FOSS on a Thermal Camera – Sebastian Reichel, Collabora

slides

A thermal camera is similar to a normal camera in technology, but captures infraread instead of visible spectrum. Usually the resolution is lower than a typical camera. One reason for this is that the US has quite tight export restrictions on thermal cameras. These cameras used to be really expensive and super low resolution, e.g 256×192@25fps for €300 – which is actually more than what US export restrictions allow. They are used for various things, including military (hence the export restrictions), but also for inspecting PCBs for faults.

The software on this “cheap” camera is quite bad. It takes 30 seconds to boot and is also slow at runtime. If you capture an image al the overlay (e.g. battery% is on there as well). You also can’t fix the color scale, which can be annoying if you want to zoom in on an area and compare with a different one.

Opening the camera, there are a bunch of chips that can be identified and searched for on the web. Turns out it has an i.MX6ULL. And an unpopulated serial connector, including labels. The thermal camera is on a separate module and also has an unpopulated serial connector.

First thing to do is to measure voltages, to make sure you don’t connect 5V serial to a 1.8V pin… To make debugging easier, he installed a bluetooth module on the serial port. That way the case could be closed again, which makes travelling easier. For instance, if it’s open, it’s hard to get it through security on an airport.

UART output shows U-Boot 2015.04 and Linux 3.14 debug output. The boot can be interrupted and kernel command line adapted with init=/bin/sh to explore the system, which seems to be Ubuntu based. There’s only one binary running, an unstripped binary linked with Qt and OpenCV. There’s no optimisation, e.g. kernel has CAN drivers built in…

It has a complicated boot command that looks for several boot devices, even though they’re not used. It’s easy to modify it to boot directly to eMMC instead of scanning the non-existent network and SD card first.

He built a very minimal kernel and device tree from scratch. Kernel is built with make imx_v6_v7_defconfig and then disabling all modules.

The device has no reset button. If it crashes, you’d have to wait for the battery to run out to reset it, or open the device and disconnect the battery. To make recovery easier, the watchdog timer is enabled so it resets after crash.

To build a good device tree, the existing device tree was dumped. GPIOs can be investigated in sysfs, by looking at which driver is associated with them. E.g. leds-gpio for the flashlight LED.

The boot loader initializes the LCD and prints a debug output about it. This way you can find which SPI controller it is.

The device has an USB-C port for charging and for a UVC gadget interface.

Some kernel patches were needed to get the battery handling and LCD display working. These were upstreamed. The optical camera (Galaxycore GC0308) didn’t have an upstream driver. There are a bunch of out-of-tree low quality drivers. It often breaks with the i.MX6ULL CSI driver.

The thermal camera is a UVC device. However, it lies about its data format, and it takes vendor USB control commands, e.g. to choose the temperature range (100C or 550C).

It turns out the camera sensor board has another Linux on it. That one is a lot smaller and more locked down. The RTS3903N SoC is fully out of tree, with no publicly available sources. It’s also physically quite fragile and easy to break, e.g. by probing a pin.

So to reverse engineer the UVC protocol, he instead reverse-engineers the application binary on the original firmware image. Unfortunately, that binary doesn’t work with the upstream kernel, so no runtime debugging. Can use reverse-engineering tools like radare2 or Ghidra.

Pocket size virtual machine – Small Linux VM payloads in Android – Pierre-Clément Tosi, Google

slides

An application can abuse vulnerabilities in the kernel to attack other processes. How can we make sure that sensitive information is protected from attackers in such a model? One solution is ARM TrustZone, where sensitive information is stored in the Trusted Execution Environment (TEE). Even the kernel cannot access those trusted applications.

However, the trusted applications have a quite easy time to access the Linux kernel and userspace. So if complicated applications are moved to the TEE, any vulnerabilities that they have become a lot more powerful to exploit. In addition, it’s generally a whole lot more complicated to deploy updates of trusted applications. Also, TEEs are much more fragmented, thus a trusted application is much more device-specific. Finally, the TEE has a much less rich environment of libraries and existing software components to work in than in Linux. Thus, trusted applications are not a silver bullet to solve the isolation problem.

The android system therefore introduces a hypervisor-based approach to isolation. A hypervisor runs at a higher level than the kernel. So just like with TrustZone, the Linux kernel doesn’t have access to anything that runs in the other VMs under the hypervisor. The hypervisor can also run multiple VMs which are isolated from each other (compared to the TEE where different trusted applications typically have very limited protection from each other).

The AVF (Android Virtual Framework) model is similar to that of KVM. The Virtual machine Monitor (VMM) runs in userspace of the host system. The hypervisor is responsible for memory management. The host kernel schedules the guest pVMs (protected Virtual Machines), so kernel vulnerabilities can still open denial-of-service attacks to the VMs. KVM is extended to protect parts of the VM payloads from modification by the VM guest OS itself, and also protect the VM’s memory space against access by the host kernel. This is called protected KVM (pKVM) and is upstreamed.

The pVMs run Microdroid, which is a stripped down Android. So all the Android interfaces (NDK APIs, binder, SELinux, ADB, …) are available. What is not there: Java, graphics, HALs. Microdroid loads and executes an APK payload.

If host is not trusted, we need a way to certificate guest images. Using the hypervisor for that is increasing the attack surface, so a trusted piece of software, pvmfw, is injected into the guest VM to communicate with the host. pvmfw executes within the VM, it verifies the image, and fails to boot if verification fails. This firmware is working as a bootloader and initializes the memory for guests + base of system like caches, etc. It was originally a special built of U-Boot. In Android 14, however, it is part of AOSP release (but not everything is available upstream) and integrated in the build system so easier to use. It is written in Rust, crates to handle low-level aarch64 page handling and hypervisor calls are published on crates.io.

A future feature is to support device passthrough, so the pVMs can actually access physical hardware. There should also be a (standardized) channel to the TEE. The hypervisor itself needs improved performance.

[ Olivier’s personal thoughts: ARM is pushing a certain way to secure a system, typically a smartphone, with TrustZone at the HW side and TF-A, OP-TEE at the SW side. Google wants to do it differently. We do understand why, but the motivation is quite specific to them: making a way to run a vendor’s VM on a system owned by another vendor. I don’t think this is very relevant in the embedded linux world. It is more for the cloud, where they use more and more ARM64 CPUs to win power. ]

TTY layer – here lies daemons – Greg Koan-Hartman, Linux Foundation

slides

There’s a great explanation of the userspace side of tty: https://www.linusakesson.net/programming/tty This talk is about the kernel side.

TTY layer is old and complex, but Linux bacame successful because of it. Microkernels often struggle with serial ports. It’s crazy, but it works.

There are actually three layers, but they blend into each others: tty, line discipline, and serial port. Consoles are something on stop of that still, but that at least, though also complex, is pretty well walled off from the rest.

tty is the char device that is visible to userspace. A line discipline can be assigned to it. ioctls available on the char device depend on the attached line discipline.

The line discipline is the protocol to be talked on the serial port. There are about 20 of them. A line discipline works on any serial port / tty device.

The serial port is the driver for the hardware itself. There are about 40 of them. For new hardware, you can probably reuse one of the existing drivers, e.g. 8250. Some are virtual, e.g. pty. There are also still subsystems under it, e.g. USB serial which supports about 15 devices.

ttyprintk.c is a nice simple example of a (virtual) serial port driver. It’s an interface through which you can write stuff to the kernel log from userspace.

The core structures of the tty layer are:

  • struct tty_struct: the central structure. There are many locks in it, one of them each emulates the old BKL. All those different locks are added for performance, but they’re a problem for real time because they destroy determinism.
  • struct tty_port: corresponds to the character device. A big structure as well (36 callbacks… + a lot of locks…)
  • struct uart_port: 27 callbacks + a bit less of locks
  • struct usb_serial_driver: 38 callbacks…
  • struct usb_serial_device: provides struct device and tty_struct

Data flow

tty_write() is the basic way that data is written to a serial port. It can come from the char device, but also from console or a lot of odd ways. It takes an iov_iter rather than a buffer. tty_write_lock() optimizes getting the lock by first trying, then spinning for a while, and finally giving up and returning to userspace.

iterator_tty_write() manages an internal buffer for the data using kvmalloc(), which may sleep. It finally calls copy_from_iter() and goes into the line discipline. There’s a loop in there with cond_resched() to do some cooperative scheduling – yet another source of non-determinism, this doesn’t look at priorities at all.

There’s a call to tty_update_time() to update the access time of the device node. This can be pretty expensive on some architectures. It iterates over all open file descriptors. It only updates time if it hasn’t been changed in the last 8 seconds.

In some places, there are calls into the audit framework. This may cause data to be sent over the network, to disk, etc. so may take an indeterminate amount of time. For real-time, better disable auditing.

Conclusions

tty is complicated and way too flexible. There are too many entry and exit points. There are many places where it might sleep, many locks that add non-determinism. UART hardware is complex but also dumb, drivers need to be fairly large to handle them.

On the other hand, it’s very fast. It’s also flexible It’s what makes Linux successful.

How to fix it.

printk (logging) is an important piece of the puzzle, because that must work. The approach is to break straight through the whole subsystem where needed.

Don’t assume anything about latency for a tty device. Don’t call it from realtime userspace tasks.

The best way to fix it is not to use it. Replace with a simple ringbuffer-based driver that basically exposes the serial port directly to userspace without all the flexibility.

[ Olivier’s personal thoughts: No need to understand the programming: the message is: “As opposed to what it seems, the TTY layer is complex and quite legacy. Use it as-is. Stop thinking it is easy or you need to extend it with new devices. And never use it in real-time tasks.” ]

Inter-container IPC with VirtIO and Yocto – Eilís “pidge” Fhlannagían, BayLibre

slides

VirtIO is a set of standards to provide interfaces for virtual devices to VMs. Passing data from one VM to another through VM incurs a lot of overhead (system calls through 3 different kernels, thus a lot of context switches). vhost-user is a workaround for that that moves the data plane to userspace (keeping the control plane in the kernel). DPDK is an example of this.

Since vhost-user bypasses the kernel, it can also be used for containers to bypass the kernel. Containers could do that anyway, of course, but vhost-user is a standard that can be reused. [Arnout, Charles-Antoine: As far as we understand, the idea is that with vhost-user, it’s not needed to write some specific guest driver to run in the container to achieve the kernel bypass feature, but instead DPDK on top of vhost-user can be reused as-is.] It just needs to be untangled from the VM-specific assumptions of the existing components. This is called Exceptional Data Path Acceleration.

LXC is a light-weight container framework. It is not used a lot, however. OCI/runc or docker are perhaps better alternatives. In yocto, LXC is supported in meta-virtualization. DPDK is in meta-dpdk (recently split out from meta-intel). meta-lxc-dpdk is a layer that consumes those other two and ties everything together. There are a bunch of problems with incompatibilities [Arnout: which is a direct result of the community fragmentation prevalent in yocto].

The build of an image also builds some LXC container images and installs them in the overall image. lxc-config class is used to define the container images and configuration. It works by combining a number of config fragments. The images themselves are simply image bb files. Currently, this relationship is still hardcoded in recipe names.

The goal of this work is to squeeze out every last bit of performance. For normal use cases however, it’s not really worth going through all this effort. The normal container-to-container communication through veth interfaces is fairly performant already.

[ Olivier’s personal thoughts:

  • Interesting domain but Pidge assumed we know a lot about Virtio, DPDK and AGL, while we do not.
  • Virtio is interesting and worth studying more. It is more than just a piece for VM, it defines APIs to abstract functions in the kernel.
  • LXC is more interesting than what it seems, and it may be better than Docker. ]