Skip to main content

Testing Ceph RBD Performance with Virtualization

This post explains how I measured Ceph RBD performance with block/network virtualization technology (virtio and vhost), and the result.

VM execution is done through qemu-system-x86_64, without using libvirt.

  • Host: Fedora 33 (Linux kernel 5.10.19)
  • Guest: Ubuntu 20.04 LTS Server (Linux kernel 5.4.0.67)
# Qemu general configuration
qemu-system-x86_64 --machine q35,accel=kvm, --enable-kvm \
    -m 2048 -smp 2 -cpu host \
    -vnc :0 \
    -hda disk.qcow2 \
    <append configuration-specific options>

Performance is measured with fio running on the guest VM with the following configuration file:

# Latency measurement
[global]
bs=4K
iodepth=1
direct=1
group_reporting
time_based
runtime=120
numjobs=1
name=latencytest
rw=randrw
rwmixread=90

# Throughput measurement
[global]
bs=4K
iodepth=64
direct=1
group_reporting
time_base
runtime=120
numjobs=1
name=throughputtest
rw=randrw
rwmixread=90

Access through network #

A VM can access a Ceph RBD via its network. virtio-net and vhost-net can be used to virtualize guest network.

In this case, fio running on the VM can be tested in two ways such like a normal Ceph client uses:

  • via librbd: fio provides rbd ioengine, which uses librbd.
    [rbdtest]
    ioengine=rbd
    clientname=admin
    pool=rbd
    rbdname=test
    
  • via krbd: mount a rbd image to a device /dev/rbd0 on the guest operating system, and use libaio as fio’s ioengine.
    [krbdtest]
    ioengine=libaio
    filename=/dev/rbd0
    

1.1. virtio-net #

With virtio-net, a guest VM directly accesses a Ceph RBD image via network.

<qemu general configuration>
-netdev type=user,id=net0 \
-device virtio-net-pci,netdev=net0,mac=<mac> 

Note that -netdev option can be replaced with -nic: -nic user,model=virtio-net-pci,mac=<mac>. Refer to this.

Just for reference, -netdev and -nic are more modern ways than -net command line option.

1.2. vhost-net #

To use vhost-net, the host kernel should be built with CONFIG_VHOST_NET enabled.

$ lsmod | grep vhost
vhost_net              32768  1
tun                    57344  3 vhost_net
tap                    28672  1 vhost_net
vhost                  57344  2 vhost_net

We at first create a bridge to connect tap device to network. I will use NAT networking to provide network to a guest VM, as described below:

image
Virtual network example. Here libvirt implements a virtual network switch with DHCP feature with dnsmasq. [Source]

It can be created with brctl, however, it has been deprecated and recommended to use ip instead 1.

$ ip link add br0 type bridge
$ ip addr add 192.168.122.1/24 broadcast 192.168.122.255 dev br0
$ ip link set br0 up

Then, create a script named qemu-ifup that can be used by QEMU to attach a tap device that will be created to the bridge:

#!/bin/sh
set -x

switch=br0

if [ -n "$1" ]; then
    ip tuntap add $1 mode tap
    ip link set $1 up
    sleep 0.5s
    ip link set $1 master $switch
    exit 0
else
    echo "Error: no interface specified"
    exit 1
fi

Then, it is now ready to run a VM:

<qemu general configuration>
-netdev type=tap,id=net0,vhost=on,script=/path/to/qemu-ifup \
-device virtio-net-pci,netdev=net0,mac=<mac>

You can check whether vhost-net is used or not 2. On the host terminal, type:

$ ip -d tuntap
tap0: tap one_queue vnet_hdr
        Attached to processes:qemu-system-x86(6919)

A VM then cannot access to the network due to lack of DHCP feature. Either use dnsmasq, or manual IP assignment is required.

Using dnsmasq is well explained in here 3, and static IP assignment is also explained in here 4. I will follow the latter, with an additional works to be done.

I wanted to make a virtual network with subnet 192.168.122.0/24, so that I added 192.168.122.1 address to the bridge. Following it, change a guest VM’s network configuration. Note that, Ubuntu 20.04 manages its network configuration with netplan utility, so some legacy ways such as using systemd-resolved or /etc/networking no longer work. Instead, modify a configuration file in /etc/netplan/00-installer-config.yaml:

network:
    ethernets:
    enp0s2:
        dhcp4: true
    version: 2

to

network:
    ethernets:
    enp0s2:
        addresses: [192.168.122.100/24]
        gateway4: 192.168.122.1
        nameservers:
            addresses: [8.8.8.8]
    version:

which means, it has a static IP 192.168.122.100 and use the bridge in the host as its gateway. After modifying it, apply it with:

$ sudo netplan apply
$ ip addr
...
2: enp0s2 : ...
    link/ether ....
    inet 192.168.11.1200/24 brd 192.168.122.255 scope global enp0s2
    ...
$ ip route
default via 192.168.122.1 dev enp0s2 proto static
192.168.122.0/24 dev enp0s2 proto kernel scope link src 192.168.122.100

Access through block device #

When QEMU mounts a rbd image via librbd to the VM, then a VM can access a Ceph RBD via the device. virtio-blk and vhost-scsi can be used.

In this case, fio running on the VM can be tested with krbd only:

[krbdtest]
ioengine=libaio
filename=/dev/rbd0

2.1. virtio-blk #

With virtio-blk, QEMU uses librbd to mount rbd image to VM device /dev/vda.

<qemu general configuration>
-drive format=rbd,id=rbd0,file=rbd:rbd/test,if=none,cache=directsync \
-device virtio-blk-pci,drive=rbd0,id=virtioblk0

Note that cache option can vary, and refer to this for more options. Here I used directsync to measure latency correctly, otherwise it uses kernel cache and write performance is shown incorrectly.

2.2. vhost-scsi #

To use vhost-scsi, you need vhost-scsi kernel module enabled.

$ lsmod | grep vhost
vhost_scsi             45056  5
vhost                  57344  1 vhost_scsi
target_core_mod       417792  14 target_core_file,target_core_iblock,iscsi_target_mod,vhost_scsi,target_core_pscsi,target_core_user

Your QEMU configuration should be like 5 6:

<qemu general configuration>
-device vhost-scsi-pci,wwpn=naa.number,event_idx=off

vhost moves context switch between host user space and host kernel space, which means a Ceph RBD image must also be mounted to the host kernel via krbd:

$ rbd map test -p rbd
/dev/rbd0

Now create a SCSI LUN (Logical Unit Number):

$ sudo targetcli
targetcli shell version 2.1.53
Copyright 2011-2013 by Datera, Inc and others.
For help on commands, type 'help'.

/> backstores/block create name=rbd0 dev=/dev/rbd0
Created block storage object rbd0 using /dev/rbd0.
/> /vhost create
Created target naa.5001405b3bafbfbd.
Created TPG 1.
/> /vhost/naa.5001405b3bafbfbd/tpg1/luns create /backstores/block/rbd0
Created LUN 0.
/> ls
o- / ......................................................................................................... [...]
  o- backstores .............................................................................................. [...]
  | o- block .................................................................................. [Storage Objects: 1]
  | | o- rbd0 ........................................................... [/dev/rbd0 (20.0GiB) write-thru activated]
  | |   o- alua ................................................................................... [ALUA Groups: 1]
  | |     o- default_tg_pt_gp ....................................................... [ALUA state: Active/optimized]
  | o- fileio ................................................................................. [Storage Objects: 0]
  | o- pscsi .................................................................................. [Storage Objects: 0]
  | o- ramdisk ................................................................................ [Storage Objects: 0]
  o- iscsi ............................................................................................ [Targets: 0]
  o- loopback ......................................................................................... [Targets: 0]
  o- vhost ............................................................................................ [Targets: 1]
    o- naa.5001405b3bafbfbd .............................................................................. [TPGs: 1]
      o- tpg1 .................................................................. [naa.5001405f58d02fbc, no-gen-acls]
        o- acls .......................................................................................... [ACLs: 0]
        o- luns .......................................................................................... [LUNs: 1]
          o- lun0 ...................................................... [block/rbd0 (/dev/rbd0) (default_tg_pt_gp)]

In this case, naa.5001405b3bafbfbd is a wwn. Put it in the QEMU configuration value for the key wwpn (BTW, what these acronyms stand for? I don’t know).

When you use scsi for your VM, the disk is shown as /dev/sda, not /dev/rbd0.


  1. KVM Configuring Guest Networking ↩︎

  2. Red Hat blob: Hands on vhost-net: Do. Or do not. There is no try ↩︎

  3. How to Setup a DNS/DHCP Server Using dnsmasq on CentOS/RHEL 8/7 ↩︎

  4. How to Assign Static IP Address on Ubuntu 20.04 LTS ↩︎

  5. pNFS SCSI Testing: Vhost Setup for KVM ↩︎

  6. libvirt Vhost-scsi Target ↩︎