Research
You can also find my publications on my Google Scholar profile.
Featured Recent Papers
- PEEK: Context Map as an Orientation Cache for Long-Context LLM AgentsarXiv preprint, 2026
- DroidSpeak: KV Cache Sharing for Cross-LLM Communication and Multi-LLM ServingNSDI 2026[paper]
- METIS: Fast Quality-Aware RAG Systems with Configuration AdaptationSOSP 2025
Featured Recent Projects
- Carnot: Interpretable, Interactive, and Optimized Execution of Deep Research QueriesEnterprise-grade system that improves the quality, cost, and latency of execution plans generated by Deep Research agents over private data warehouses.[website]
Other Papers
- AdaptCache: KV Cache Native Storage Hierarchy for Low-Delay and High-Quality Language Model ServingSOSP 2025 Workshop on Big Memory (BigMem)[paper]
- LLMSteer: Improving Long-Context LLM Inference by Steering Attention on Reused ContextsNeurIPS 2024 Workshop on Machine Learning for Systems
- Transformer-based Predictions for Sudden Network ChangesNSDI 2024 Poster Session[poster]
- An Introduction to Loewner EnergyUChicago Math REU, 2024[paper]
- A Study in Markov Chains, Loop-Erased Random Walk, and Loop SoupsUChicago Math REU, 2023[paper]
Other Projects
- LMCache: The first open-source Knowledge Delivery Network (KDN) for LLM applicationsAccelerates LLM applications up to 8x faster, at 8x lower cost.
- vLLM Production StackScales from a single vLLM instance to a distributed vLLM deployment without changing any application code.[code]
- KV Cache Compression and Streaming for Multimodal Large Language Models (MLLMs)
- Knowledge Streaming from LLMs to Environments[website]
