Software development notes
Bookworm ships with GCC-12, which is unfortunate, because OpenBMC userspace
needs to be compiled with GCC-13. So we’re back in the familiar territory of
needing a different compiler. The Debian testing (trixie) and unstable (sid)
suites ship GCC-13, but using the usual method of package pinning still results
in apt
wanting to upgrade most of my system. Let’s look at a different
approach.
New job, new laptop, and I no longer an M1 MacBook Pro. This time around I have a Lenovo P14s, which unfortunately comes with a touchscreen. Touchscreens are terrible, so let’s disable it.
libpldm
ABI Reference DumpsThe existence of libpldm
’s stable
visibility class implies that we shouldn’t
break the existence or behaviour any of its functions. This promise is only as
good as our ability to measure it. The ideal measurement is that any stable API
and ABI breaks are detected by Continuous Integration (CI). To that end I
integrated support for abi-compliance-checker
into the build system.
The nuts and bolts of endianness are a bit fiddly. Keeping value endianness in mind when reading through memory dumps is annoying but not intractable. In my mind, a more important concern is deciding where to address endianness in a system design. The answer to that is very likely “at the boundaries”, but this also requires knowing where the boundaries are in a system design.
These days I’m using Fedora on an aarch64
system but still have a need for
cross-compiling bits and pieces to x86_64
and other architectures.
Unfortunately, Fedora doesn’t provide a cross toolchain capable of building
userspace binaries. So, is there a relatively straight-forward work-around?
To steal from wikipedia, C is an imperative procedural language. Given the language’s lack of formal support for more abstract constructs like object-oriented classes, it’s easy to reach the view that its ecosystem is a grab-bag of loosely related functions. Developing applications and libraries with this perspective can lead to choices that feel kind of arbitrary. We need to be conscious that what we’re doing is imposing a structure on the code in order to organise our own thoughts and those of others.
OpenBMC uses systemd as a system management daemon. While this raises some
eyebrows given the environment, it’s what we have. At least it provides
reliability and familiarity as we disregard the perceived complexity. With
systemd comes an opportunity to use systemd-bootchart
for measuring boot-time
behaviour of the system. While it’s relatively easy to use in general, some of
the details of the OpenBMC boot process can get in the road.
libpldm
Developing and maintaining libraries is a very different ballgame to applications. Internal functions of an application tend to have a closed set of call-sites. Under these conditions refactoring is often straight-forward: Rework your internal APIs and then clean up the resulting compiler errors. By contrast libraries rarely have a closed set of call-sites for their APIs. This means breaking an API impacts a potentially unknowable number of applications, and makes for a bad experience for the library’s users when they try to update.
Experience suggests that configuring an Aspeed BMC for host console access can be a confusing task. The UART capabilities provided by the SoCs allow access to the host console both via physical connectors on the rear of the chassis and also via Serial-over-LAN (SOL).
obmc-console
obmc-console
is quite a slow-paced project relative to others in the OpenBMC
ecosystem, but recently I’ve merged quite a few changes. The bad news is not
all of them have have kept things in working order in the OpenBMC distro, so
let’s look at what’s happened, what’s broken, and what we need to do to fix it.
Pushing patches for review to gerrit.openbmc.org
automatically triggers CI
jobs on jenkins.openbmc.org
. Almost always this triggers builds and a bunch of
linters to run over the change. Many of the linters are also formatters, such as
prettier
, black
or clang-format`.
As it stands any differences introduced by the linters causes a build failure.
obmc-console
with socat
This is a bit of a gross hack. However, it serves to demonstrate a way to test
the obmc-console
stack without requiring integration into a BMC and booting
its host (or some equally tedious arrangement).
touch-required
FIDO2 Authentication with sshd
and a Yubikey on Fedora 38Because I’m lazy the network contains printers and other devices whose
firmware hygiene generally causes infosec side-eye. Leaving sshd
exposed to
password-based authentication attempts didn’t evoke feelings of comfort.
obmc-console
service units to expose multiple host consoles to the BMC networkobmc-console
provides the plumbing to expose one or more host consoles onto
BMC’s network interfaces. It comes in two parts:
obmc-console-server
obmc-console-client
libpldm
In Motivating a New Scheme for PLDM Instance ID Management in
OpenBMC I talked about why we need to change how instance IDs are managed in
OpenBMC. Underpinning it is the shift to using AF_MCTP
sockets provided by
Linux.
Recently Rashmica has been doing some work to enable use of Linux’s AF_MCTP
sockets in OpenBMC. Until now we’ve relied on a userspace implementation of MCTP
through libmctp, but this rapidly hit limitations at the kernel/userspace
interface boundary. To fix that, Code Construct did the work to move MCTP into
the kernel.
I’ve had an M1 MacBook Pro lying beside me for some time now, waiting for me to find a good workflow and migrate onto it.
opkg
-based OpenBMC development workflowPreviously I talked about the mechanics of how I develop bits and pieces of
userspace for OpenBMC. What I will discuss this time is an alternative
flow that replaces the use of devtool deploy-target
with opkg
.
I recently pushed a couple of tools (overlay and bbdbg) into the openbmc-tools repository that help me develop userspace software for OpenBMC. This post describes how I work with all the different tools involved.
Working with long-term forks in git can be painful. It doesn’t have to be.
dbus-pcap
: Debugging OpenBMC with busctl capture
@jessfraz recently wrote an ACM Queue article on BMCs and the availability of open-source BMC firmware. OpenBMC gets a mention, though the article also points out that it’s modular design interconnected with D-Bus “makes the BMC software more complex to debug, audit, and put into production.”
dd conv=notrunc
: Working around limitations of busyboxBusybox’s dd(1)
as shipped in OpenBMC (as of Jan 21 2020) doesn’t support
conv=notrunc
, which is a mighty handy option if you’re trying to patch some
binaries. Which I unfortunately was.
OpenBMC supports several generations of BMC SoCs. In the case of ASPEED BMC SoCs, each generation has moved forward with the supported ISA and hardware features, and the ARMv7 AST2600 now sports hard-float support in the form of vfpv4d16.
Recently a few of us were interested in designing an eMMC flash layout that allowed for secure boot of a BMC’s userspace while also catering to robustness across updates. This post covers a script I developed to road-test secure rootfs eMMC images under QEMU. The script appears at the end after the discussion of how we implement it.
KASAN (Kernel Address SANitizer, the kernel implementation of an existing userspace aid) for 32-bit ARM is not yet in mainline Linux, but v6 of a series adding support was posted in mid-2019. As part of tracking down squashfs decompression errors in OpenBMC I took v6 for a test drive. Ultimately it didn’t help my cause, but I had some fun debugging KASAN along the way.
Trying to debug intermittent data corruption under QEMU can be fairly painful, especially if you’re trying to reproduce with stock boot images. I happened to be in this dark corner recently and wanted to enable some extra debug output in the provided kernel. Thankfully Linux supports enabling dynamic debug statements on the kernel commandline, though depending on which statements you want and how you’re trying to enable them this can be a straightforward or like trying to run a Rube Goldberg machine in reverse.
Unsorted Block Images is a magic Linux kernel subsystem for handling wear levelling, data integrity management and dynamic partitioning of raw flash devices. It’s magic in the sense that it does a lot of work under the covers, frequently shuffling data around to uphold the desirable properties of the subsystem.
Debugging broken kernel / hypervisor interactions can be painful, and sometimes requires some creativity to extract the necessary information. This was the case recently when chasing down a bug whose most obvious symptom was squashfs decompression failures that appeared intermittently when booting Witherspoon OpenBMC firmware on QEMU’s witherspoon-bmc platform model.
You’re doing bringup of a board or SoC or have got yourself in a tight spot; you have no networking and are unable to write to local storage in the runtime environment. How do you boot a custom firmware or kernel? If you’re local the obvious approach is to use an external tool to write the boot storage, but lets add to the challenge and say you’re doing this remotely. You have your board hooked up to a terminal server, and are at the u-boot prompt.
A part of OpenBMC has a use-case where we needed to build a project composed of multiple repositories arranged as submodules with a submodule tree depth greater than one. The catch is that we don’t need to initialise submodules below the first layer, and the approach of initialising them anyway means a large time and bandwidth penalty downloading unnecessary information.
There are two intuitive approaches to testing OpenBMC kernels with QEMU:
-kernel
, -dtb
and -initrd
options to
QEMU-drive file=...,if=mtd,format=raw
Hostboot’s console output is pretty terse, and doubly-so when things go wrong. Debugging Hostboot the Hard Way gives some insight on how to extract more information from hostboot to root-cause problems and provide some tips on debugging code under development.
Hostboot is split into several parts, in terms of the artifacts generated and the roles of those parts. At a high level, hostboot is its own cache-contained operating system. Here we explore how this firmware OS fits together.
Hostboot is one part of the firmware stack that initialises an OpenPOWER system - it is the first piece of software to execute on the “main” cores of the CPU. Developing software that runs in this environment is always challenging, but the nature of its implementation also adds to the level of difficulty.