Publications

RLHFuse: Efficient RLHF Training for Large Language Models with Inter- and Intra-Stage Fusion.
DistTrain: Addressing Model and Data Heterogeneity with Disaggregated Training for Multimodal Large Language Models.
DistServe: Disaggregating Prefill and Decoding for Goodput-optimized Large Language Model Serving.
LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism.
Aquifer: Transparent Microsecond-scale Scheduling for vRAN Workloads.
MegaScale: Scaling Large Language Model Training to More Than 10,000 GPUs.
DistMind: Efficient Resource Disaggregation for Deep Learning Workloads.
Fast Distributed Inference Serving for Large Language Models.
AlpaServe: Statistical Multiplexing with Model Parallelism for Deep Learning Serving.
ElasticFlow: An Elastic Serverless Training Platform for Distributed Deep Learning.