Leaky faucet

A Memory Leak in the Linux Kernel Bluetooth Stack

A Memory Leak in the Linux Kernel Bluetooth Stack

Olivier L’Heureux
24/06/2024

An Embedded Platform Crashes after 4 Days

The whole story began, as often, with an informal report from a colleague: “My embedded platform crashes after 4 days running. I think there is a memory leak somewhere.

We have asked him to open a bug report, and the hunt began. It was a long one.

Reproducing, phase 1

The usual first step in bug hunting is reproducing the bug.

We could not immediately reproduce it, we had to ask the special config used by the colleague. Our embedded device offers several connectivity links – Ethernet, Wi-Fi, Bluetooth Low Energy (BLE). The colleague used a sort of special “trick”: when testing with Ethernet, he kept BLE enabled with a fixed peer MAC address, but did not use it. This allows him to use a single configuration for various test scenarios. The memory leak happens when BLE is enabled but the peer doesn’t exist. There is no memory leak when BLE is disabled, or when we can actually connect over BLE.

We could find an easy command to see the leak: observe the output of

cat /proc/slabinfo | grep -E '^(# name|kmalloc-(2k|1k|256|192))'

The size of the “kmalloc-1k” and “kmalloc-192” memory buckets were increasing.

Understanding and Characterising

We were wondering how it was possible to find a memory leak in the Bluetooth subsystem: is it not used in all Android smartphones? We began to understand the specific conditions to trigger the memory leak. Our application uses BLE in two unusual ways:

  1. It uses the L2CAP BLE layer. (BLE may be used with other layers on top of L2CAP.)
  2. It tries to connect to a known BLE MAC address, and does not first can for all BLE devices in reach.

And as explained above, we were enabling BLE without using it. All those conditions explain why most other BLE stack users do not see any memory leak.

Tools

Our main investigation tool were the kernel Dynamic Debug traces: they were already enabled in the v5.13 kernel of the embedded device. Coupled with the Kernel Memory Leak Detector, we were equipped for the hunt.

Searching If It Was Known

Before reporting a kernel bug, it is good to search if the bug was known before: it could be already reported, and there can be fixes or work-arounds. That search was long, because error message matches could come from several root causes, and we had to read and understand several complicated bugs showing the same error messages.

We have found the same memory leak in the L2CAP layer reported by syzbot on 2023-09-02 23:25:00 -0700, but it seems nobody provided a fix or even insight.

Reproducing, phase 2

Buildroot

When a bug is hard to find, it pays off to reproduce it easily and quickly, on a broadly available platform, using only open-source software. This allows others to reproduce the bug and possibly help with the debugging or patching, and it allows you to progress faster as well. The others are either specialists from the hardware vendors, kernel maintainers, or domain specialist that are following the email lists.

We could reproduce the BLE memory leak on ST Microelectonics’ DK2, which uses the same CPU as our embedded system, though with a different Bluetooth chipset. We have used the Buildroot build system, because it builds a Linux system quickly, which helps to test quickly. Following section “Using Buildroot during development” of Buildroot’s manual, we could reproduce the memory leak on both kernel v5.15 and v6.5, the lastest kernel at that time. Of course, we needed to reproduce the memory leak on the latest kernel sources, to show the problem still exists.

Patches & Reproduction

We could propose a patch series that fixes the memory leak for us.

The patches created by Olivier can be found at: https://gitlab.com/essensium-mind/ble-memleak-repro

The README in that repository describes a way to reproduce on a STM32 DK2 module, and also using QEMU. Unfortunately, reproduction is not reliable under QEMU. Because we have a DK2 module at our disposal, we will use that.

The first patch is used to enable bluetooth on the DK2.

The second patch introduces a bunch of printk messages to make tracing the problem easier.

The following patches fix the memory leak on the 5.13 kernel.

After fixing the memory leak, additional problems were discovered with using resources after free. For this, Olivier also created patches.

The link also includes steps to reproduce the memory leak. Activating the extra logging with:

mount -t debugfs none /sys/kernel/debug
echo 'module bluetooth +p' > /sys/kernel/debug/dynamic_debug/control

will make the memory leaks visible in dmesg.

The end – and a new beginning

We didn’t feel very comfortable about the patches that fix the memory leak. We had only tested them in three specific scenarios: with the small reproducer, with our own application code when the peer is not present, and with our own application code when the peer is present. We hadn’t tested any normal scenario or the tests from l2cap-tester.

Unfortunately, we had also run out of budget to work further on the issue. It was fixed for our specific use case after all. Therefore, we sent the patches as an RFC to the mailing list and stopped there.

However, we were going to pick it up again once we could find some free time. That happened about two months later, and will be discussed in part 2 of this article.

Presentations

Drop the docs and embrace the model with Gaphor Fosdem '24 - Frank Van Bever 20 March, 2024 Read more
How to update your Yocto layer for embedded systems? ER '23 -Charles-Antoine Couret 28 September, 2023 Read more
Tracking vulnerabilities with Buildroot & Yocto EOSS23 conference - Arnout Vandecapelle 12 July, 2023 Read more
Lua for the lazy C developer Fosdem '23 - Frank Van Bever 5 February, 2023 Read more
Exploring a Swedish smart home hub Fosdem '23 - Hannah Kiekens 4 February, 2023 Read more
prplMesh An Open-source Implementation of the Wi-Fi Alliance® Multi-AP (Arnout Vandecappelle) 25 October, 2018 Read more

 

News