Skip to main content

Table of Contents


  • Systems for Distributed ML Training
    PyTorch DeepSpeed
    Recent trend of growing model size forces to use distributed training with multiple computing devices. I am designing and implementing an efficient distributed training system based on DeepSpeed to fully utilize all resources in data centers.

  • Reimplementing Hyperloop
    RDMA Infiniband
    As a part of LineFS research project, we had to measure Hyperloop performance, however, its implementation was not open sourced. I built simulated Hyperloop for comparison. I was not able to fully implement it due to lack of RDMA functionality; later RedN introduces ENABLE verb that makes full Hyperloop implementation possible.

  • Implementing Heterogeneous Trusted Execution Environment
    Intel SGX GPU PCIe architecture
    HIX extends the protection scope of hardware-based trusted execution environment (TEE) to heterogeneous computing devices. Based on the insight that Intel SGX protects the data with manuever in address translation (TLB entries are not inserted into the TLB for unauthorized accesses) and modern high performance device access is done through memory-mapped I/O (MMIO), we extended the protection mechanism to MMIO. Only trusted process called the GPU enclave can access the GPU, and trusted processes can use the GPu service only through the GPU enclave via encrypted communication.\


  1. [SOSP ‘21] LineFS: Efficient SmartNIC Offload of a Distributed File System with Pipeline Parallelism (54/348 = 15.5%) Best Paper Award!
    Jongyul Kim, Insu Jang, Waleed Reda, Jaeseong Im, Marco Canini, Dejan Kostić, Youngjin Kwon, Simon Peter, and Emmett Witchel
    Paper Bibtex
  2. [ASPLOS ‘19] Heterogeneous Isolated Execution for Commodity GPUs (74/350 = 21.1%)
    Insu Jang, Adrian Tang, Taehoon Kim, Simha Sethumadhavan, and Jaehyuk Huh
    Paper Slides Bibtex