PPoPP 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026
Tue 3 Feb 2026 16:10 - 16:30 at Balmoral - ML Inference Chair(s): Hailong Yang

Engaging applications with diverse SLO requirements has become indispensable for production-scale LLM serving systems. However, existing systems rely on iteration-level scheduling, which enforces inflexible, unified execution across multi-SLO workloads, significantly constraining the serving efficiency.

In this paper, we introduce layer-level scheduling, a novel mechanism that advances beyond conventional iteration-level granularity. This mechanism decomposes per-iteration computation into fine-grained layer operations, enabling the tailored execution of requests with differing requirements. However, this increased granularity introduces new challenges in both intra-instance request execution and cross-instance coordination, posing significant barriers to practical deployment. To address these challenges, we introduce Laser, a system designed for efficient multi-SLO LLM serving. The key aspect lies in the seamless integration of inter-instance request dispatching with layer-level scheduling within instances, delivering high serving throughput with SLO guarantees. Evaluations with real-world applications reveal that Laser effectively improves throughput by over 1.67x while maintaining the same SLO attainment rate compared to state-of-the-art systems.

Tue 3 Feb

Displayed time zone: Hobart change

15:50 - 17:10
ML InferenceMain Conference at Balmoral
Chair(s): Hailong Yang Beihang University
15:50
20m
Talk
BEEMS: Boosting Machine Vision Efficiency via Computation Graph-Based Memory Smoothing
Main Conference
Hanjing Shen Shanghai Jiao Tong University, Fangxin Liu Shanghai Jiao Tong University, Jian Liu Beijing University of Aeronautics and Astronautics, Li Jiang Shanghai Jiaotong University, Haibing Guan Shanghai Jiao Tong University
DOI
16:10
20m
Talk
Laser: Unlocking Layer-Level Scheduling for Efficient Multi-SLO LLM Serving
Main Conference
Jianxiong Liao Sun Yat-sen University, ​​Quanxing​ Dong​ Sun Yat-sen University​, Yunkai Liang Sun Yat-sen University, Zhi Zhou Sun Yat-sen University, Xu Chen Sun Yat-sen University
DOI
16:30
20m
Talk
MixFusion: A Patch-Level Parallel Serving System for Mixed-Resolution Diffusion Models
Main Conference
Desen Sun University of Waterloo, Zepeng Zhao Carnegie Mellon University, Yuke Wang Rice University
DOI
16:50
20m
Talk
ChituDiffusion: A Data-Characteristic-Aware Serving System for Diffusion Models
Main Conference
Chengzhang Wu Tsinghua University, Liyan Zheng Tsinghua University, Haojie Wang Tsinghua University, Kezhao Huang Tsinghua University, Zixuan Ma Tsinghua University, Dong Dong , Jidong Zhai Tsinghua University
DOI