Research
You can also find my articles on my Google Scholar profile.
Featured Recent Papers & Projects
- METIS: Fast Quality-Aware RAG Systems with Configuration AdaptationSOSP 2025
Other Papers
- DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM ServingNSDI 2026 (to appear)[paper]
- LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused ContextsNeurIPS 2024 Workshop on Machine Learning for Systems
- AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model ServingSOSP 2025 Workshop on Big Memory (BigMem)[paper]
- Transformer-based Predictions for Sudden Network ChangesNSDI 2024 Poster Session[poster]
- An Introduction to Loewner EnergyUChicago Math REU, 2024[paper]
- A Study in Markov Chains, Loop-Erased Random Walk, and Loop SoupsUChicago Math REU, 2023[paper]
Past Projects
- LMCache: The first open-source Knowledge Delivery Network (KDN) for LLM applicationsAccelerates LLM applications up to 8x faster, at 8x lower cost.
- vLLM Production StackScale from a single vLLM instance to a distributed vLLM deployment without changing any application code.[code]
- KV Cache Compression and Streaming for Multimodal Large Language Models (MLLMs)
- Knowledge Streaming from LLMs to Environments[website]
