Software development notes
by Andrew
Recently Rashmica has been doing some work to enable use of
Linux’s AF_MCTP
sockets in OpenBMC. Until now we’ve
relied on a userspace implementation of MCTP through
libmctp, but this rapidly hit limitations at the kernel/userspace
interface boundary. To fix that, Code Construct did the work to move MCTP into
the kernel.
A consequence of using libmctp
as the MCTP implementation in OpenBMC is that
other components in the distro had to make use of the AF_UNIX
socket provided
by the mctp-demux-daemon
. These components include the requester API of
libpldm
. That isn’t so much of a concern in itself as the mctp-demux-daemon
design was (intentionally) socket-based, but a further problem was that the
design of the requester API baked in some assumptions that the underlying MCTP
transport implementation was the AF_UNIX
socket provided by
mctp-demux-daemon
.
So, a new requester-related API is needed to transition libpldm
from using the
AF_UNIX
socket from mctp-demux-daemon
to AF_MCTP
sockets provided by the
kernel. A side-goal of this new interface is to also allow the use of The Other
PLDM Transport Binding, PLDM over RBT.
At this point a tangle of problems emerged:
libpldm
APIs to encode a PLDM message require that an instance ID is
provided, conflating serialisation of the message structure with framing the
serialised message for transportSo instead of redesigning a requester API for libpldm
, the job became an
effort to design two new lower-level APIs. The two live side-by-side and
together can be used to compose an eventual requester API implementation:
As mentioned above, the instance IDs are used by the protocol for message correlation and to drive the timeout/retry state machine. That’s straight-forward enough, until we consider that:
libpldm
for a variety of
purposeslibpldm
doesn’t provide a true requester API, forcing its users to
implement the behaviour themselvesA subtle point here is that as a result of its current API design libpldm
embeds the instance ID in the PLDM message at the point of serialisation, which
is a separate concern from exchanging serialised messages. As such, allocation
and deallocation of instance IDs is decoupled from any timeout/retry semantics.
Technically we cannot assume that the expiry of an instance ID can be measured
from the point at which it was allocated.
We also need to consider that applications can crash and leak any instance ID resources they had allocated but were yet to release. That leads us to the current architecture for instance ID management:
pldmd
exposes a DBus object implementing
xyz.openbmc_project.PLDM.Requester
to satisfy the requirement for a global
instance ID allocatorpldmd
request an instance ID through the DBus
interface provided by pldmd
Like the existing requester API in libpldm
, the implementation of the
xyz.openbmc_project.PLDM.Requester
interface in pldmd
is unfortunately tied
to implementation details of mctp-demux-daemon
. We can observe by the
interface definition that no method to return an
instance ID to TID’s pool is provided1. It turns out mctp-demux-daemon
has
the (unfortunate?) behaviour of sending all messages to all connections on its
AF_UNIX
socket for the given message type if the traffic is destined for the
local EID. This includes responses from remote endpoints.
In this manner, pldmd
snoops on the response traffic intended for another
application to reclaim the instance ID it had handed out. This snooping
behaviour also allows it to infer when a response hasn’t been received in a
timely fashion, and to expire the allocation accordingly.
Thus we have the properties that:
A problem we have is that AF_MCTP
sockets do not allow pldmd
to snoop the
traffic in this fashion, and so we cannot uphold
property 1:
Sockets will only receive responses to requests they have sent (with TO=1) and may only respond (with TO=0) to requests they have received.
As a result, it’s not enough to provide an instance ID allocation API in
libpldm
that abstracts over DBus calls to GetInstanceId
on pldmd
’s
xyz.openbmc_project.PLDM.Requester
interface. Instead, to migrate to AF_MCTP
as the PLDM transport implementation we have to either:
mctp-demux-daemon
and is robust against
application crashesOption 1 requires that libpldm
take a dependency on e.g. libsystemd
for it’s
sd_bus_*
APIs to handle the DBus traffic for the instance ID lifecycle.
Further, the need for IO to acquire the instance ID means we need to design the
API such that it can be used asynchronously. Finally, implementing the API in
terms of DBus also prevents pldmd
from exploiting the API to implement the
DBus interface (pldmd
would call into itself via the DBus interface, ending
either in recursion or deadlock).
None of these are particularly appealing.
The question is then whether option 2 is feasible. Going that path, the work for the instance ID allocation API becomes:
libpldm
libpldm
instance ID allocation API in terms of the new
scheme from 1pldmd
’s implementation of the xyz.openbmc_project.PLDM.Requester
DBus interface to be in terms of the libpldm
APIlibpldm
to call the new instance ID API directly
instead of all independently implementing the calls to the current DBus
interfaceThe task order is important. Implementing the instance ID allocation API first
in terms of the DBus interface is invalid: The conversion of pldmd
must be not
before the API implementation has switched to the new scheme. By contrast, if an
application is reworked to use the API before the switch to the new scheme, the
act of switching the implementation to the new scheme to enable the conversion
of pldmd
causes the application and pldmd
to lose coherency. Thus, the
conversion of pldmd
must be first, and the API need never be implemented in
terms of the DBus interface.
With that done, we can then progress the effort to switch over to using
AF_MCTP
as the MCTP transport implementation.
Eagle-eyed readers would also note that GetInstanceId
is defined in
terms of the destination MCTP EID and not the destination TID as specified,
putting an architectural ding in the desire to support RBT as a PLDM
transport ↩