29 May Impressions from Linaro Connect 2024 (part 1)
Impressions from Linaro Connect 2024 (part 1)
Arnout Vandecappelle
29/05/2024
Linaro Connect is a conference organised by Linaro to bring together people working on Open Source for ARM. In addition to the usual talks, it also has Demo Friday with a large number of demonstrations of the technologies that Linaro (and a few others) have been working on. Although the conference is open to any speakers, in practice it is dominated by presentations by people from Linaro (more than 2/3 of the presentations I attended). With about 200 attendees it’s not a very large conference, but big enough for 3 parallel tracks.
This article gives my own reflections about some of the talks I attended. If one of them looks interesting, go to the linked resources to view the slides or the video recording. I attended many more talks than the ones below, but I don’t have much useful to say about the others.
Overall, Linaro Connect is an in-depth technical conference with a focus on topics that I’m almost entirely interested in. Even the keynotes, which I’m usually not really a fan of, were not too much commercial blahblah and rather interesting.
Linaro Automation Appliance preview
Rémi Duraffort, Sahaj Sarup (Linaro)
slides & video – clicking starts download of PDF!
For years now, Linaro has been doing a lot of testing on various different boards. Maintaining these test setups is quite a lot of work because the hardware tends to be messy and breaks easily. The Linaro Automation Appliance (LAA) should makes this easier.
The LAA is a custom board that replaces the PC (or SBC) that runs the tests and connects to the DUTs. Because it is a custom board, it can integrate all the parts that typically break easily – in particular the relays (for the power supply) and serial ports. It has a big connector with a lot of mechanical stability. To this you connect a Mechanical Interface Board (MIB) that fits it to the DUT board. The MIB typically is a simple 4-layer board with only wires and connectors, no logic, so it is really simple to design. With the LAA, the hardware for the test setup is very robust, very easy to connect, and a lot less messy. Although traditionally a PC would connect to several DUTs, for the LAA they have chosen for a one-to-one relation between LAA and DUT. That makes the design simpler, but more importantly, reduces the risk of hardware faults. A typical problem they encounter, for example, is that a DUT crashes the USB hub through which it is connected to the PC, and this in turns makes the other DUTs fail.
The LAA also comes with a software stack that connects to the fleet management platform in ONELab (Linaro’s test lab management platform which uses LAVA under the hood). The LAA has a small display that shows the board’s IP address. Connecting to the internal web server on this IP address, the credentials can be entered to register the board in the management platform. Once this is done, testing can start immediately. If tests require additional software to be installed, the management platform allows to install that as an ARM64 container.
The concept of LAA is something that people who are involved with large scale testing have considered repeatedly, and finally someone has gone and done it! The hardware design is really well thought out, in particular when it comes to mechanical stability and overall robustness. The board is a lot bigger than you’d expect for this simple functionality, but the large size makes sure it is possible to screw a board of any size. For many boards it will still be needed to do a bit of soldering (e.g. to be able to emulate button presses, the buttons have to be replaced with wires connected to the relays of the LAA), but again due to the large size of the MIB these loose wires are not so easily damaged when somebody touches the setup.
Boot time optimization project
Daniel Lezcano (Linaro)
slides & video – clicking starts download of PDF!
Linaro’s Boot time optimization project tries to make generic improvements to the boot time of an embedded Linux system. Although boot time optimization is platform- and project-specific (e.g. making the kernel image smaller by removing drivers that you don’t need), there are patterns that occur repeatedly. A further problem is that boot time tends to drift: you optimize it, but then new features are added and team members change and components are updated, and suddenly it takes twice as long to boot and you have to start optimizing again. So an important aspect of boot time optimization is to automatically track regressions in boot time.
That automatic tracking of regression is the first milestone for this project. They will set up a reference platform where new versions of the firmware (U-Boot), kernel and OS (userspace) are built and the boot time is tracked.
The next goal is to perform some advanced improvements for boot time. For example, in the firmware there are often artificial delays during hardware initialisation. It also doesn’t parallelize different drivers. For both, it’s hard to understand if there is a real reason for that or if it is accidental. Another issue in the firmware is that the CPU frequency is set too low (which is sometimes necessary if the board doesn’t have sufficient cooling). One of the more revolutionary optimizations they want to pursue is to avoid to re-initialise hardware in the kernel when it has already been initialised by the boot loader. That will require some communication between firmware and kernel to clarify which hardware was already initialised. Another idea is to look at memory initialization, which is currently done for the whole memory before anything else is done. This can be solved by initializing maximum e.g. 64GB in early boot, and enable the rest with memory hotplug later.
Although boot time optimisation is important, I had the impression that the advanced ideas they were looking at are not realistic. In addition, they’ll only be useful to save the last (milli)seconds, and for the first dozens of seconds of gain we’ll still have to take the good old measures like making sure that only the drivers that are really needed are included in the kernel and reducing the size of the initramfs.
Modern CI solutions for Linux Kernel Testing
Vishal Bhoj (Linaro)
slides & video – clicking starts download of PDF!
When you develop patches for the Linux kernel, it is expected that those patches are tested before you submit them. However, developers are currently completely on their own for this testing. After submission and acceptance there are various test projects, like KernelCI, that test your (integrated) patch under various circumstances, but it’s hard to do something similar before submission.
TuxSuite is a set of tools that make it easier for a developer to test the kernel. It consists of the following components.
– TuxMake builds the kernel in a container which has the appropriate cross-toolchain.
– TuxBake is like TuxMake but for Yocto, i.e. to build the corresponding userspace.
– TexRun boots the kernel on QEMU and Arm FVP.
– TuxSuite Plans describes tests to execute.
– TuxSuite API runs tests in the cloud.
– TuxSuite CLI leverages all of the above to build and test locally.
– TuxTriggers monitors a git tree to trigger TuxSuite.
– TuxSuite Cloud is a service that offers all of the infrastructure (runners) to run the tests.
A TuxSuite Plan is a yaml file that specifies a number of jobs. Each job has a description of the toolchains, targets, kernel configs, and which tests to run. The tests come from a pre-defined set of tests, e.g. kselftests, ltp.
TuxSuite is available as a Gitlab Pipeline Component, so you can directly use it in CI of your own kernel fork with a single line in the CI/CD configuration. The default Plan builds with GCC 11, 12 and 13 and runs in QEMU for Arm, Arm64, x86_64, i386, riscv, mips, sh, s390x, ppc, sparc64. It is done in a sub-pipeline with separate jobs for each target. The default plan can be overriden with your own by specifying a CI variable in your CI/CD config, or with a push option. To run it with TuxSuite Cloud, you only need to add your access token in a CI variable. If you clone your kernel tree from the official https://gitlab.com/linux-kernel/linux you get free CI minutes to run the pipelines.
One interesting feature that still needs to be finished is to dynamically select the plan based on which files are modified, so you only run the tests or platforms that are relevant.
TuxSuite is obviously geared towards kernel developers that develop for upstream. However, individual tools (like TuxRun) can also be relevant for private development of custom kernel patches. It seems to me however that the pre-defined set of tests in TuxSuite Plan could be a major limitation.
What Can Static Analysis Do For You (Smatch mostly)
Dan Carpenter (Linaro)
slides & video – clicking starts download of PDF!
The kernel is (nowadays) a heavy user of static analysis. 2-4% of patches come from static analysis. This is surprising – static analysis should be run on patches before they’re accepted, so bugs detected by static analysis shouldn’t ever be included in git history. However, the static analysis tools improve all the time, or their application area broadens.
One of these tools is smatch, developed by Dan Carpenter. smatch uses sparse (another static analysis tool in the kernel) as a front-end and works on pre-processed code (so it can’t really detect problems in macros themselves). It does cross-functional flow analysis, i.e. it keeps track of state information over the code flow. This state includes value ranges, comparison with other variables, buffer sizes, whether it can be controlled by the user (i.e. risk for attacks). It also detects impossible code paths (e.g. assertion-style checks that can never happen because the caller doesn’t pass invalid parameters); this is important to improve the state information.
The cross-functional analysis is done by doing an analysis pass and annotating functions with the information obtained, then repeating that process to refine the information up and down the call hierarchy. Thus, a cross-function database is built up. Building up this database takes a very long time, but it’s not hard to do. Once this database is built up, smatch performs checks on the discovered properties. These are based on patterns of code that is (potentially) wrong. For example, there’s a pattern to detect when a constant is assigned to a variable inside a condition (for the infamous =
instead of ==
mistake); for smatch, it’s OK to assign in conditions, but if you’re assigning a constant then there is no reason to actually do that within the condition so it’s probably a mistake.
smatch discovers a lot of false positives. Therefore, when you use it, you should not simply try to fix all the errors it finds, but only the new ones. Of course, those new ones may also be false positives. In general, it’s not a good idea to rewrite the code to satisfy the static analysis tool – only do that if the code actually becomes more readable that way. The presentation has an example but unfortunately just a single one, it would have been nice to have more examples of false positives and if they can be rewritten to look better.
This talk was an interesting technical deep dive, but I’m not sure if it helps me much. Perhaps I’ll be a bit more inclined to use smatch (and other static analysis tools) when I do kernel development. Unfortunately it looks like smatch is very kernel-specific and would probably be hard to use on other projects.
Implementing an Openchain compliant policy and best practice at Linaro
Carlo Piana, Alberto Pianon (Array)
slides & video – clicking starts download of PDF!
When you are serious about compliance, you need to be aware of all the components that are combined into the product that you deliver, and their license restrictions. In many cases, the risk of a lawsuit isn’t even for yourself, but rather for your downstreams. The latter is particularly the case when you’re creating an OS rather than a specific product, and that is exctly the situation for Eclipse Oniro, the case that this presentation is about.
To be compliant, you have to:
– know which components you are using (Software Composition Analysis);
– identify the license(s) for each component;
– identify the risks and obligations tied to the artifacts (sources and binaries) that you produce and distribute;
– establish a process to do this continuously;
– produce artifacts to demonstrate compliance to your downstream consumers.
As it happens, a lot of the same work (the Software Composition Analysis part) is also needed for security, i.e. to be able to detect when your software is exposed to a certain vulnerability. With the CRA and Cybersecurity Act coming up, this makes the compliance effort relevant again for everybody.
OpenChain is a Linux Foundation project to establish a standard for managing supply-chain compliance. It is now an ISO standard. You have compliance artifacts that show that you have an established system. The main artifact is a machine-readable SBoM (typically in SPDX format).
Array is a small legal firm that supports companies when it comes to open source. For the Eclipse Oniro project, they established the processes (and tools supporting them) to achieve OpenChain compliance for this project.
With a build system like Yocto and Buildroot, the Software Component Analysis is already largely a given, i.e. you get a list of artifacts that get installed on the system, and you get for each artifact what the corresponding source is.
Still, for each artifact, you need to determine the licenses that apply to it. The license metadata that is included with the build system is unfortunately not entirely accurate, e.g. it typically gives the declared license of the project, while the project also includes some code with other licenses. What you really need is a per-file analysis of the source and what license applies to that source. To be entirely accurate, you would also need to know if that source actually ends up in the artifact, but that is currently an unsolved problem. For finding the licenses in the source, the ScanCode tool is the go-to open-source tool nowadays. Unfortunately, it still returns too many false positives and false negatives. So review is needed. Fossology allows humans to add and control this information. But it’s still too much work to do this for thousands of packages.
The Array people had the idea to reuse an already existing repository of accurate license information: Debian. Debian has a curated, file-level, machine-readable license meta-data. So they developed the Aliens4Friends tool that tries to find a matching package in Debian, and imports the license information from that into fossology. They also developed a dashboard that shows how far you are with verifying all packages. And finally they developed a data collector that integrates with the Yocto build system (which is now upstream).
There is still an audit needed to make sure that everything is covered. For example, if a file is not exactly the same as in Debian, someone needs to verify if these changes also have an impact on the license. Also license incompatibilities have to be resolved, e.g. by correcting the license information or by removing files are entire packages.
The tooling is currently based on Yocto but they would like to support other build systems as well.
For me, this was a super interesting talk. The main takeaway I got was that the existing SBoM generation in Yocto is not actually good enough. I also like very much the idea of sharing the information in Fossology globally, but it seems to be very difficult to do this in practice (it was attempted in the past for Yocto but AFAIK this is no longer actively maintained).
Developer first: Open-Source and Next-Gen AI apps journey continues
Leendert van Doorn (Qualcomm)
slides & video – clicking starts download of PDF!
This keynote speech was about Qualcomm’s future plans in AI and in open source. I don’t care much about machine learning so for that part of the talk you’ll need to watch the video. About open source, however, Leendert made some very interesting statements.
I see Qualcomm as the typical traditional big chip vendor who see open source as something they have to do in order to sell chips, but not as something that benefits them. Apparently, this philosophy is changing. This is best illustrated with the acquisition of Foundries.io. Foundries was very much an Open Source First company – not only do they make sure that all their own software is open source, they also maximize reuse of existing open source software and they work with upstream when something needs to be changed. I was afraid that this would change due to the acquisition. However, Leendert (who is pretty high up the food chain in Qualcomm) said explicitly that not only Foundries would be able to operate like they did before, but also part of the motivation for the acquisition was for Qualcomm to learn the Open Source First philosophy from Foundries! We of course have to see how this evolves in practice, but it gives good hopes for Qualcomm’s future.
Presentations