Software development notes
by Andrew
Recently Rashmica has been doing some work to enable use of
Linux’s AF_MCTP sockets in OpenBMC. Until now we’ve
relied on a userspace implementation of MCTP through
libmctp, but this rapidly hit limitations at the kernel/userspace
interface boundary. To fix that, Code Construct did the work to move MCTP into
the kernel.
A consequence of using libmctp as the MCTP implementation in OpenBMC is that
other components in the distro had to make use of the AF_UNIX socket provided
by the mctp-demux-daemon. These components include the requester API of
libpldm. That isn’t so much of a concern in itself as the mctp-demux-daemon
design was (intentionally) socket-based, but a further problem was that the
design of the requester API baked in some assumptions that the underlying MCTP
transport implementation was the AF_UNIX socket provided by
mctp-demux-daemon.
So, a new requester-related API is needed to transition libpldm from using the
AF_UNIX socket from mctp-demux-daemon to AF_MCTP sockets provided by the
kernel. A side-goal of this new interface is to also allow the use of The Other
PLDM Transport Binding, PLDM over RBT.
At this point a tangle of problems emerged:
libpldm APIs to encode a PLDM message require that an instance ID is
provided, conflating serialisation of the message structure with framing the
serialised message for transportSo instead of redesigning a requester API for libpldm, the job became an
effort to design two new lower-level APIs. The two live side-by-side and
together can be used to compose an eventual requester API implementation:
As mentioned above, the instance IDs are used by the protocol for message correlation and to drive the timeout/retry state machine. That’s straight-forward enough, until we consider that:
libpldm for a variety of
purposeslibpldm doesn’t provide a true requester API, forcing its users to
implement the behaviour themselvesA subtle point here is that as a result of its current API design libpldm
embeds the instance ID in the PLDM message at the point of serialisation, which
is a separate concern from exchanging serialised messages. As such, allocation
and deallocation of instance IDs is decoupled from any timeout/retry semantics.
Technically we cannot assume that the expiry of an instance ID can be measured
from the point at which it was allocated.
We also need to consider that applications can crash and leak any instance ID resources they had allocated but were yet to release. That leads us to the current architecture for instance ID management:
pldmd exposes a DBus object implementing
xyz.openbmc_project.PLDM.Requester to satisfy the requirement for a global
instance ID allocatorpldmd request an instance ID through the DBus
interface provided by pldmdLike the existing requester API in libpldm, the implementation of the
xyz.openbmc_project.PLDM.Requester interface in pldmd is unfortunately tied
to implementation details of mctp-demux-daemon. We can observe by the
interface definition that no method to return an
instance ID to TID’s pool is provided1. It turns out mctp-demux-daemon has
the (unfortunate?) behaviour of sending all messages to all connections on its
AF_UNIX socket for the given message type if the traffic is destined for the
local EID. This includes responses from remote endpoints.
In this manner, pldmd snoops on the response traffic intended for another
application to reclaim the instance ID it had handed out. This snooping
behaviour also allows it to infer when a response hasn’t been received in a
timely fashion, and to expire the allocation accordingly.
Thus we have the properties that:
A problem we have is that AF_MCTP sockets do not allow pldmd to snoop the
traffic in this fashion, and so we cannot uphold
property 1:
Sockets will only receive responses to requests they have sent (with TO=1) and may only respond (with TO=0) to requests they have received.
As a result, it’s not enough to provide an instance ID allocation API in
libpldm that abstracts over DBus calls to GetInstanceId on pldmd’s
xyz.openbmc_project.PLDM.Requester interface. Instead, to migrate to AF_MCTP
as the PLDM transport implementation we have to either:
mctp-demux-daemon and is robust against
application crashesOption 1 requires that libpldm take a dependency on e.g. libsystemd for it’s
sd_bus_* APIs to handle the DBus traffic for the instance ID lifecycle.
Further, the need for IO to acquire the instance ID means we need to design the
API such that it can be used asynchronously. Finally, implementing the API in
terms of DBus also prevents pldmd from exploiting the API to implement the
DBus interface (pldmd would call into itself via the DBus interface, ending
either in recursion or deadlock).
None of these are particularly appealing.
The question is then whether option 2 is feasible. Going that path, the work for the instance ID allocation API becomes:
libpldmlibpldm instance ID allocation API in terms of the new
scheme from 1pldmd’s implementation of the xyz.openbmc_project.PLDM.Requester
DBus interface to be in terms of the libpldm APIlibpldm to call the new instance ID API directly
instead of all independently implementing the calls to the current DBus
interfaceThe task order is important. Implementing the instance ID allocation API first
in terms of the DBus interface is invalid: The conversion of pldmd must be not
before the API implementation has switched to the new scheme. By contrast, if an
application is reworked to use the API before the switch to the new scheme, the
act of switching the implementation to the new scheme to enable the conversion
of pldmd causes the application and pldmd to lose coherency. Thus, the
conversion of pldmd must be first, and the API need never be implemented in
terms of the DBus interface.
With that done, we can then progress the effort to switch over to using
AF_MCTP as the MCTP transport implementation.
Eagle-eyed readers would also note that GetInstanceId is defined in
terms of the destination MCTP EID and not the destination TID as specified,
putting an architectural ding in the desire to support RBT as a PLDM
transport ↩