Skip to main content

Using RDMA CM

Table of Contents

Years before, I posted how to use libibverbs for RDMA communication.

When initializing queue pair connection, we need some destination information:

bool changeQueuePairStateToRTR(struct ibv_qp* queue_pair, int ib_port, uint32_t destination_qp_number, uint16_t destination_local_id) {
  struct ibv_qp_attr rtr_attr;
  memset(&rtr_attr, 0, sizeof(rtr_attr));
  rtr_attr.qp_state = ibv_qp_state::IBV_QPS_RTR;
  rtr_attr.path_mtu = ibv_mtu::IBV_MTU_1024;
  rtr_attr.rq_psn = 0;
  rtr_attr.max_dest_rd_atomic = 1;
  rtr_attr.min_rnr_timer = 0x12;
  rtr_attr.ah_attr.is_global = 0;
  rtr_attr.ah_attr.sl = 0;
  rtr_attr.ah_attr.src_path_bits = 0;
  rtr_attr.ah_attr.port_num = ib_port;
  
  rtr_attr.dest_qp_num = destination_qp_number; // here
  rtr_attr.ah_attr.dlid = destination_local_id; // and here

  return ibv_modify_qp(queue_pair, &rtr_attr, IBV_QP_STATE | IBV_QP_AV | IBV_QP_PATH_MTU | IBV_QP_DEST_QPN | IBV_QP_RQ_PSN | IBV_QP_MAX_DEST_RD_ATOMIC | IBV_QP_MIN_RNR_TIMER) == 0 ? true : false;
}

and I did not specify how to transfer data to the remote side. There are two ways of doing it; one is just implement a TCP/UDP socket and transfer data through this channel, the other is to use rdma-cm.

RDMA-CM #

CM stands for Communication Manager, which can be used to control QP and communication management. Its implementation is actually based on TCP/UDP, but their abstraction is useful and we don’t have to manually implement QP state transition, illustrated in here. If you need fine-tuning, you need to implement your own initialization mechanism, but otherwise using librdmacm would be enough.

Code and implementation is here.

Interface #

sudo apt install librdmacm-dev

#include <rdma/rdma_cma.h>

RDMACM provides three types of operations: RDMA verbs, client operations, and server operations. RDMA verbs are wrappers of libibverbs, so I won’t cover those operations here.

The Debian librdmacm document and an RDMA example explain how to use librdmacm:

  • rdma_create_event_channel: create channel to receive events.
  • rdma_create_id: allocate an rdma_cm_id, this is conceptually similar to a socket, and relies on a custom NETLINK family RDMA_PS_[TCP|UDP].
  • rdma_resolve_addr: obtain a local RDMA device to reach the remote address.
  • rdma_get_cm_event: wait for an event. In example, this function call is wrapped wih process_rdma_cm_event. Here an event that should be received is RDMA_CM_EVENT_ADDR_RESOLVED .
  • rdma_ack_cm_event
  • rdma_resolve_route: determine the route to the remote address. Should try to get and ack another cm event RDMA_CM_EVENT_ROUTE_RESOLVED.
  • rdma_create_qp: Allocate a queue pair for the communication. This function call can be anywhere before rdma_connect (requires rdma_cm_id.qp that is assigned by rdma_create_qp).
  • rdma_connect: connect to the remote server. Should try to get and ack another cm event RDMA_CM_EVENT_ESTABLISHED.