Research

You can also find my articles on my Google Scholar profile.

Featured Recent Papers & Projects

  • METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation
    Siddhant Ray, Rui Pan, Zhuohan Gu*, Kuntai Du, Ganesh Ananthanarayanan, Ravi Netravali, Junchen Jiang
    SOSP 2025

Other Papers

  • DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving
    Yuhan Liu, Yuyang Huang, Jiayi Yao, Shaoting Feng, Zhuohan Gu, Kuntai Du, Hanchen Li, Yihua Cheng, Junchen Jiang, Shan Lu, Madan Musuvathi, Esha Choukse
    NSDI 2026 (to appear)
  • LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts
    Zhuohan Gu*, Jiayi Yao*, Kuntai Du, Junchen Jiang
    NeurIPS 2024 Workshop on Machine Learning for Systems
  • AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model Serving
    Shaoting Feng, Hanchen Li, Kuntai Du, Zhuohan Gu, Yuhan Liu, Jiayi Yao, Siddhant Ray, Samuel Shen, Yihua Cheng, Ganesh Ananthanarayanan, Junchen Jiang
    SOSP 2025 Workshop on Big Memory (BigMem)
  • Transformer-based Predictions for Sudden Network Changes
    Siddhant Ray, Zhuohan Gu, Xi Jiang, Junchen Jiang, Nick Feamster
    NSDI 2024 Poster Session
  • An Introduction to Loewner Energy
    Zhuohan Gu, Dadu Chen
    UChicago Math REU, 2024
  • A Study in Markov Chains, Loop-Erased Random Walk, and Loop Soups
    Zhuohan Gu
    UChicago Math REU, 2023

Past Projects

  • LMCache: The first open-source Knowledge Delivery Network (KDN) for LLM applications
    Accelerates LLM applications up to 8x faster, at 8x lower cost.
  • vLLM Production Stack
    Scale from a single vLLM instance to a distributed vLLM deployment without changing any application code.
  • KV Cache Compression and Streaming for Multimodal Large Language Models (MLLMs)
  • Knowledge Streaming from LLMs to Environments