Software development notes
by Andrew
obmc-console is quite a slow-paced project relative to others in the OpenBMC ecosystem, but recently I’ve merged quite a few changes. The bad news is not all of them have have kept things in working order in the OpenBMC distro, so let’s look at what’s happened, what’s broken, and what we need to do to fix it.
obmc-console
follows the distro’s C style guidelines
but until now nothing enforced this, which makes it tricky to follow if you’re
not used to kernel style or don’t have the astyle
invocation handy. C is also
notoriously difficult to write without invoking undefined-, unspecified- or
implementation-defined behaviour.
A set of separate observations is that the distro has a
preference for using meson as the build
system for its subprojects. Somewhat aligned with this preference is that the CI
scripts have much stricter requirements on projects using meson
, with better
exploitation of valgrind, sanitizers, clang-tidy
and code-coverage than the support for other build systems such as autotools
or cmake
. In addition to sanitizers, the CI scripts also support formatting of
the source code through clang-format.
To better communicate expectations on others for their contributions to
obmc-console
I did the work to enable
clang-format
, enable
clang-tidy
, and to then exploit the additional
features in CI by converting the project from autotools
to
meson
. After fixing up the bitbake
recipe I’ve removed the autotools
build
system.
In parallel to this effort another push began to shift
straggling projects in the distro over to meson
. With obmc-console
now
converted we’ve knocked at least one project off that hit-list.
In a recent post I outlined how we can synthetically test the functionality of
obmc-console
using socat
. I won’t say too much
about that here, other than to mention the patches are still in
review.
IBM has quite a few internal changes to bmcweb. As part of trying to reduce the size of our downstream patch stack we’re trying to upstream a bunch of functionality.
The ability to restrict consoles to a given group ended up being a matter of
separating obmc-console
’s configuration of
dropbear
out from the generic use of dropbear
as
OpenBMC’s remote shell service.
bmcweb
obmc-console
has supported concurrent console
servers for some time now. While this
enabled exposing all of those servers onto the network via SSH using
dropbear it fell short of providing appropriate support for exposing these
consoles via bmcweb
. bmcweb
’s general approach to handling the dynamic
presence of features in the system is to use the
mapper to find DBus objects implementing the
relevant interface, and to then use the basename of the resulting object paths
as a tag to identify the specific instance in the system. obmc-console-server
has had DBus-based functionality for some time now1, however, no ability to
connect to the console over DBus was provided.
My effort to enable concurrent instances of obmc-console-server
failed to
address the fact that the bus name claimed was not unique to the console, and
neither was the hosted object path. We’ve fixed this in f9c8f6ca864a (“Fixed
broken dbus interface for multiple consoles”)
by exploiting the socket-id
for use as the final element of the connection
name and object path. However, that approach requires that a socket-id
is
specified, which was not a requirement that existed previously.
While ec7cab9378f5 (“Add socket-id for the first
console”) tried to address the new requirement it fell
short of fixing up the client-side configuration in some circumstances, and
obmc-console
suppport over SSH is now broken for some consoles on a number of
platforms. Console access is also broken via bmcweb
for all platforms, as the
connection logic inside bmcweb
relied on a hard-coded name for
obmc-console-server
’s Unix domain socket, which while still deterministic,
cannot be deduced without access to the obmc-console
’s configuration files.
In short, this has become a bit of an integration disaster.
Disregarding the integration woes above, we at least have the opportunity to
introduce a new DBus interface to salvage the broader problems encountered by
bmcweb
. This interface will allow bmcweb
or any other application to connect
to the console of the server instance hosting the object. The ideal interface
design would be something akin to how obmc-console-client
operates - an
abstraction over the transport mechanism. It would be nice to provide an
interface design yields a file descriptor that bmcweb
can immediately read and
write, e.g. a Connect()
method that returns a file descriptor.
My initial pass at working out whether that was feasible didn’t turn up anything
that would help. The way forward appeared as if we needed some mechanism for the
Connect()
call to trigger obmc-console-server
to connect to its own
AF_UNIX
listening socket, which turns into a twisty mess of IO. The
implementation and maintenance complexity of that solution feels like it lands
on the wrong side of the trade-off analysis, and so it was abandoned.
Instead we landed on an interface that exposes a byte array containing the name
of the abstract Unix domain socket on which other processes could connect2.
This requires those processes construct the socket and call
connect(2) in addition to the mapper lookup and fetching
the property value. This approach was defined by
xyz.openbmc_project.Console.Access
in 927d0930ca70 (“Add a new dbus interface
to get list of consoles”) and implemented in
b14ca19cf380 (“Added new dbus interface to query console
info”).
It was in the process of discussing the associated change to
bmcweb
that Jeremy swung by
to point out the existence of socketpair(2), the
missing piece of the puzzle that enables our idealised Connect()
interface.
So we have a few things to fix. The first are pretty concrete and already in progress:
xyz.openbmc_project.Console.Access
interface to phosphor-dbus-interfaces
xyz.openbmc_project.Console.Access
interface in obmc-console
xyz.openbmc_project.Console.Access
as an interface with a
Connect()
method returning an immediately-usable file descriptor for the
console.Thankfully, the patch to bmcweb
had not yet been merged. We will hold off on
that until we implement the Connect()
-based interface in obmc-console
, at
which point we can rework the bmcweb
patch and then get it reviewed and
merged.
Less obvious is what we do to resolve the integration woes that have landed us
with broken consoles. Tentatively, I’m proposing that instead of requiring a
socket-id
be defined by configuration, we provide a default value that can be
overridden by configuration. This will fix SSH console access for all platforms
in the tree that don’t already supply a client configuration file. It will not
fix the bmcweb
console, but that will be resolved by merging the patch
exploiting mapper
lookups for xyz.openbmc_project.Console.Access
.
It turns out software development is hard. It’s hard to discover the useful system calls for a design. It’s hard to comprehend integration complexity, and it’s hard to test other people’s platforms continue to behave correctly in the face of change.
Something that I can do better is to lean on our other maintainers in the OpenBMC ecosystem, to draw on their experience and point me in the right direction. A quick chat with Jeremy would have avoided at least one of the pain points outlined here. Working with others to exercise the configuration changes might have also have helped identify the resulting breakage before we hit submit.
The property is a byte array rather than a string due to a unique quirk of
abstract Unix domain sockets, which is that the first character of the
socket’s name is the NUL
byte, and more generally,
NUL
s are not significant in the abstract socket name. The initial NUL
would terminate the name as an empty string if directly interpreted as a C
string. ↩