Publications

DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving.
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs.
DistMind: Efficient Resource Disaggregation for Deep Learning Workloads.
Fast Distributed Inference Serving for Large Language Models.
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving.
ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning.