👋 Call me Joshua.


🎓 I’m a first-year CS Ph.D. student at CSAIL, MIT EECS. I graduated from the University of Chicago with a Bachelor of Science in Mathematics (with Specialization in Economics) and Computer Science.
Research
👨💻 My research interests lie broadly in computer systems and artificial intelligence. I build more efficient, reliable systems to improve ML/AI workloads. I’ve worked on ML/LLM inference and AI infrastructure, focusing on high-performance KV cache management in LLM serving, such as KV cache compression, P/D disaggregation, and KV blending/editing. I was also a community builder for two open-source projects LMCache and vLLM Production Stack as a member of LMCache Lab.
✏️ During my undergrad years, I began research in math. Advised by Prof. Gregory Lawler and Jinwoo Sung, I worked on probability theory. Later, I was fortunate to work with Prof. Junchen Jiang and Prof. Kexin Pei at UChicago, Prof. Ravi Netravali at Princeton, and Dr. Ganesh Ananthanarayanan at Microsoft Research on MLSys.
I’m always open to collaborations and working with undergraduate students. Check out Collaborations for details.
Past Projects
🚀LMCache: The first open-source Knowledge Delivery Network (KDN) that accelerates LLM applications up to 8x faster, at 8x lower cost
- LMCache Project is open-source! Check it out!
- Website: LMCache Website
🚀vLLM Production Stack: Scale from a single vLLM instance to a distributed vLLM deployment without changing any application code
- vLLM Production Stack Project is open-source! Check it out!
🚀Resource Allocation for Multi-Tenant Retrieval-Augmented Generation (RAG) Systems
Check it out here!
🚀KV Cache Compression and Streaming for Multimodal Large Language Models (MLLMs)
🚀Knowledge Streaming from LLMs to Environments
Check it out here!
Selected Publications
*: Equal Contribution.
Yuhan Liu, Yuyang Huang, Jiayi Yao, Shaoting Feng, Zhuohan Gu, Kuntai Du, Hanchen Li, Yihua Cheng, Junchen Jiang, Shan Lu, Madan Musuvathi, Esha Choukse.
DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving
NSDI 2026 (to appear) [Paper]Siddhant Ray, Rui Pan, Zhuohan Gu, Kuntai Du, Ganesh Ananthanarayanan, Ravi Netravali, Junchen Jiang.
METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation
SOSP 2025 [Paper]Zhuohan Gu*, Jiayi Yao*, Kuntai Du, Junchen Jiang.
LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts
NeurIPS 2024 workshop on Machine Learning for Systems [Paper / Poster]Siddhant Ray, Zhuohan Gu, Xi Jiang, Junchen Jiang, Nick Feamster.
Transformer-based Predictions for Sudden Network Changes
NSDI 2024 Poster Session [Poster]
All Publications
Expand
- Shaoting Feng, Hanchen Li, Kuntai Du, Zhuohan Gu, Yuhan Liu, Jiayi Yao, Siddhant Ray, Samuel Shen, Yihua Cheng, Ganesh Ananthanarayanan, Junchen Jiang.
AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model Serving
SOSP 2025 Workshop on Big Memory (BigMem) 2025 [Paper] - Yuhan Liu, Yuyang Huang, Jiayi Yao, Shaoting Feng, Zhuohan Gu, Kuntai Du, Hanchen Li, Yihua Cheng, Junchen Jiang, Shan Lu, Madan Musuvathi, Esha Choukse.
DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM Serving
NSDI 2026 (to appear) [Paper] - Zhuohan Gu*, Jiayi Yao*, Kuntai Du, Junchen Jiang.
LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused Contexts
NeurIPS 2024 workshop on Machine Learning for Systems [Paper / Poster] - Siddhant Ray, Rui Pan, Zhuohan Gu, Kuntai Du, Ganesh Ananthanarayanan, Ravi Netravali, Junchen Jiang.
METIS: Fast Quality-Aware RAG Systems with Configuration Adaptation
SOSP 2025 [Paper] - Siddhant Ray, Zhuohan Gu, Xi Jiang, Junchen Jiang, Nick Feamster.
Transformer-based Predictions for Sudden Network Changes
NSDI 2024 Poster Session [Poster] - Zhuohan Gu, Dadu Chen.
An Introduction to Loewner Energy
UChicago Math REU 2024 [Paper] - Zhuohan Gu.
A Study in Markov Chains, Loop-Erased Random Walk, and Loop Soups
UChicago Math REU 2023 [Paper]
Selected Awards
- Quad Undergraduate Research Conference Grant (supports faculty-mentored undergraduate participation in presenting papers at academic conferences), Chicago, IL, 10/2024
- Jeff Metcalf Fellowship Grant (supports students’ career goals and offsets living expenses during internships), Chicago, IL, 05/2024
- Honor Roll (maintained an overall academic average of 93% or above), Washington, D.C., US, 2018-20
- The Bijali Dutta Ghosh Book Award (awarded for commitment to and ability in the Natural Sciences), Washington, D.C., US, 2020
- Goldberg Science Award (for achievement in the sciences outside of school), Washington, D.C., US, 2020
- First Place Award, American Mathematics Competition (AMC) 12, Washington, D.C., US, 2019
Education
- B.S., University of Chicago, Mathematics and Computer Science, 2022-2025
More About Me
I grew up in Guangzhou (Canton) and Hong Kong before moving to Washington, D.C. for high school.
I speak English, Cantonese, and Mandarin fluently, and a little bit of Hakka and Spanish.
I love piano and classical music. A lot. For piano, I mainly play Beethoven and Chopin, and sometimes Saint-Saëns and Mozart.
I love sports. Also a lot. I played varsity soccer and basketball in high school, and I follow all kinds of sports, from soccer and basketball to tennis, golf, F1, etc. I’m a fan of Borussia Dortmund💛🖤.
I also love movies, astronomy, food, etc.
To be continued…
