Yinmin Zhong
Yinmin Zhong
About Me
Publications
Light
Dark
Automatic
3
Fast Distributed Inference Serving for Large Language Models
Large language models (LLMs) power a new generation of interactive AI applications exemplified by ChatGPT. The interactive nature of …
Bingyang Wu
,
Yinmin Zhong
,
Zili Zhang
,
Gang Huang
,
Xuanzhe Liu
,
Xin Jin
PDF
Cite
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion
Large deep learning models have demonstrated strong ability to solve many tasks across a wide range of applications. Those large models …
Liwen Chang
,
Wenlei Bao
,
Qi Hou
,
Chengquan Jiang
,
Ningxin Zheng
,
Yinmin Zhong
,
Xuanrun Zhang
,
Zuquan Song
,
Ziheng Jiang
,
Haibin Lin
,
Xin Jin
,
Xin Liu
PDF
Cite
LoongServe: Efficiently Serving Long-context Large Language Models with Elastic Sequence Parallelism
The context window of large language models (LLMs) is rapidly increasing, leading to a huge variance in resource usage between …
Bingyang Wu
,
Shengyu Liu
,
Yinmin Zhong
,
Peng Sun
,
Xuanzhe Liu
,
Xin Jin
PDF
Cite
Cite
×