Laser: Unlocking Layer-Level Scheduling for Efficient Multi-SLO LLM Serving
Engaging applications with diverse SLO requirements has become indispensable for production-scale LLM serving systems. However, existing systems rely on iteration-level scheduling, which enforces inflexible, unified execution across multi-SLO workloads, significantly constraining the serving efficiency.
In this paper, we introduce layer-level scheduling, a novel mechanism that advances beyond conventional iteration-level granularity. This mechanism decomposes per-iteration computation into fine-grained layer operations, enabling the tailored execution of requests with differing requirements. However, this increased granularity introduces new challenges in both intra-instance request execution and cross-instance coordination, posing significant barriers to practical deployment. To address these challenges, we introduce Laser, a system designed for efficient multi-SLO LLM serving. The key aspect lies in the seamless integration of inter-instance request dispatching with layer-level scheduling within instances, delivering high serving throughput with SLO guarantees. Evaluations with real-world applications reveal that Laser effectively improves throughput by over 1.67x while maintaining the same SLO attainment rate compared to state-of-the-art systems.
Tue 3 FebDisplayed time zone: Hobart change
15:50 - 17:10 | |||
15:50 20mTalk | BEEMS: Boosting Machine Vision Efficiency via Computation Graph-Based Memory Smoothing Main Conference Hanjing Shen Shanghai Jiao Tong University, Fangxin Liu Shanghai Jiao Tong University, Jian Liu Beijing University of Aeronautics and Astronautics, Li Jiang Shanghai Jiaotong University, Haibing Guan Shanghai Jiao Tong University DOI | ||
16:10 20mTalk | Laser: Unlocking Layer-Level Scheduling for Efficient Multi-SLO LLM Serving Main Conference Jianxiong Liao Sun Yat-sen University, Quanxing Dong Sun Yat-sen University, Yunkai Liang Sun Yat-sen University, Zhi Zhou Sun Yat-sen University, Xu Chen Sun Yat-sen University DOI | ||
16:30 20mTalk | MixFusion: A Patch-Level Parallel Serving System for Mixed-Resolution Diffusion Models Main Conference DOI | ||
16:50 20mTalk | ChituDiffusion: A Data-Characteristic-Aware Serving System for Diffusion Models Main Conference Chengzhang Wu Tsinghua University, Liyan Zheng Tsinghua University, Haojie Wang Tsinghua University, Kezhao Huang Tsinghua University, Zixuan Ma Tsinghua University, Dong Dong , Jidong Zhai Tsinghua University DOI | ||