Using RDMA CM
Table of Contents
Years before, I posted how to use libibverbs for RDMA communication.
When initializing queue pair connection, we need some destination information:
bool changeQueuePairStateToRTR(struct ibv_qp* queue_pair, int ib_port, uint32_t destination_qp_number, uint16_t destination_local_id) {
struct ibv_qp_attr rtr_attr;
memset(&rtr_attr, 0, sizeof(rtr_attr));
rtr_attr.qp_state = ibv_qp_state::IBV_QPS_RTR;
rtr_attr.path_mtu = ibv_mtu::IBV_MTU_1024;
rtr_attr.rq_psn = 0;
rtr_attr.max_dest_rd_atomic = 1;
rtr_attr.min_rnr_timer = 0x12;
rtr_attr.ah_attr.is_global = 0;
rtr_attr.ah_attr.sl = 0;
rtr_attr.ah_attr.src_path_bits = 0;
rtr_attr.ah_attr.port_num = ib_port;
rtr_attr.dest_qp_num = destination_qp_number; // here
rtr_attr.ah_attr.dlid = destination_local_id; // and here
return ibv_modify_qp(queue_pair, &rtr_attr, IBV_QP_STATE | IBV_QP_AV | IBV_QP_PATH_MTU | IBV_QP_DEST_QPN | IBV_QP_RQ_PSN | IBV_QP_MAX_DEST_RD_ATOMIC | IBV_QP_MIN_RNR_TIMER) == 0 ? true : false;
}
and I did not specify how to transfer data to the remote side. There are two ways of doing it; one is just implement a TCP/UDP socket and transfer data through this channel, the other is to use rdma-cm.
RDMA-CM #
CM stands for Communication Manager, which can be used to control QP and communication management. Its implementation is actually based on TCP/UDP, but their abstraction is useful and we don’t have to manually implement QP state transition, illustrated in here. If you need fine-tuning, you need to implement your own initialization mechanism, but otherwise using librdmacm would be enough.
Code and implementation is here.
Interface #
sudo apt install librdmacm-dev
#include <rdma/rdma_cma.h>
RDMACM provides three types of operations: RDMA verbs, client operations, and server operations. RDMA verbs are wrappers of libibverbs, so I won’t cover those operations here.
The Debian librdmacm document and an RDMA example explain how to use librdmacm:
rdma_create_event_channel
: create channel to receive events.rdma_create_id
: allocate an rdma_cm_id, this is conceptually similar to a socket, and relies on a custom NETLINK familyRDMA_PS_[TCP|UDP]
.rdma_resolve_addr
: obtain a local RDMA device to reach the remote address.rdma_get_cm_event
: wait for an event. In example, this function call is wrapped wihprocess_rdma_cm_event
. Here an event that should be received isRDMA_CM_EVENT_ADDR_RESOLVED
.rdma_ack_cm_event
rdma_resolve_route
: determine the route to the remote address. Should try to get and ack another cm eventRDMA_CM_EVENT_ROUTE_RESOLVED
.rdma_create_qp
: Allocate a queue pair for the communication. This function call can be anywhere beforerdma_connect
(requires rdma_cm_id.qp that is assigned byrdma_create_qp
).rdma_connect
: connect to the remote server. Should try to get and ack another cm eventRDMA_CM_EVENT_ESTABLISHED
.