PCI Express I/O System

I/O Hardware Overview

The basic I/O hardware elements, such as ports, buses, and device controllers, accomodate a wide variety of I/O devices.

To encapsulate the details and oddities of different devices, the kernel of an operating system is structured to use device-driver modules.

A device communicates with a computer system by sending signals over a cable or through the air.

How can the processor give commands and data to a controller to accomplish an I/O transfer? The short answer is that the controlelr has one or more registers for data and control signals.

Two ways in which this communiation can occur:

  1. Use of special I/O instructions that specify the transfer of a byte or word to an I/O port address. The I/O instruction triggers bus lines to select the proper device and to move bits into or out of a device register.
  2. Using memory-mapped I/O. The device-control registers are mapped into the address space of the processor. The CPU executes I/O requests using the standard data-transfer instructions to read and write the device-control registers at their mapped locations in physical memory.


The current form of the GPU is a PCI express device. The NVIDIA GPU exposes the following base address registers (BARs) to the system through PCI in addition to the PCI configuration space and VGA-compatible I/O ports.

Memory-mapped I/O (MMIO) registers
Device memory windows.
Complementary space of BAR1.
I/O port.

BARs are standard registers and specified in PCI express configuration space header (type 0) as follows. Each BAR is 32-bit wide registers (BAR0: 10h~14h, BAR1: 14h~18h, etc).

PCIe configuration space header Figure: PCIe configuration space header for type 0

The most significant area is the BAR0 presenting MMIO registers. This is the main control space of the GPU, through which all hardware engines are controlled.

Here is the GPU information provided by lspci command that I am using. gpu_lspci

BAR0: 0xde000000 (MMIO registers)
BAR1 and BAR2: 0xd0000000 (device memory windows? 64-bit memory-map uses two adjacent BARs)
BAR3 and BAR4: 0xd8000000 (device memory windows? 64-bit memory map)
BAR5: 0xe000 (I/O ports)

PCI Expansion ROM address 0xdf000000 is saved in 30h~34h (Expansion ROM Base Address) in configuration space header, not BAR 6, as explained in ‘Implementing open-source CUDA runtime’.