Publications

DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving.
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs.
DistMind: Efficient Resource Disaggregation for Deep Learning Workloads.
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving.
ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning.
LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism.
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion.
Fast Distributed Inference Serving for Large Language Models.