BEEMS: Boosting Machine Vision Efficiency via Computation Graph-Based Memory Smoothing (PPoPP 2026 - Main Conference)

Who

Hanjing Shen, Fangxin Liu, Jian Liu, Li Jiang, Haibing Guan

Track

PPoPP 2026 Main Conference

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 3 Feb 2026 15:50 - 16:10 at Balmoral - ML Inference Chair(s): Hailong Yang

Abstract

\newcommand{\archname}{BEEMS}

With the rapid advances of deep learning-based computer vision (CV) technology, digital images are increasingly processed not by humans, but by downstream CV algorithms. In particular, the growing popularity of vision foundation models has heightened interest in deploying these models on edge devices. However, limited memory remains a key bottleneck, making memory footprint reduction essential. Mainstream model customization methods often require intensive deployment efforts and can severely degrade accuracy. Moreover, existing deep learning frameworks generally do not prioritize memory optimization. Existing memory management schemes face practical limitations, including layer-wise memory imbalance, high management overhead, and volatile memory budgets.

To tackle these issues, this work focuses on compilation-level optimizations that are explicitly designed to be memory-aware. We observe that memory usage during vision foundation model inference varies significantly over time (up to a $10\times$ difference), with extended periods of low memory demand. Based on this, we propose \archname{}, a dual-objective compiler that optimizes both memory and latency by smoothing memory usage across the computational graph. \archname{} analyzes the vision foundation model computational graph to identify peak and trough operators in terms of memory demand. It then builds an efficient optimization search space, offering a flexible interface that applies different strategies based on operator characteristics. Specifically, peak operators are optimized using techniques such as operator partitioning, kernel substitution, swapping, and rematerialization to reduce memory pressure, while trough operators apply subgraph substitutions to improve latency. Experiments on six diverse models show that \archname{} reduces peak memory by up to 90% and improves latency by 10%, demonstrating its effectiveness in jointly optimizing memory and performance.

DOI

https://doi.org/10.1145/3774934.3786430

Hanjing Shen

Shanghai Jiao Tong University

Fangxin Liu

Shanghai Jiao Tong University

Jian Liu

Beijing University of Aeronautics and Astronautics

Li Jiang

Shanghai Jiaotong University

Haibing Guan

Shanghai Jiao Tong University

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 3 Feb
Displayed time zone: Hobart change

15:50 - 17:10	ML InferenceMain Conference at Balmoral Chair(s): Hailong Yang Beihang University

15:50 20m Talk		BEEMS: Boosting Machine Vision Efficiency via Computation Graph-Based Memory Smoothing Main Conference Hanjing Shen Shanghai Jiao Tong University, Fangxin Liu Shanghai Jiao Tong University, Jian Liu Beijing University of Aeronautics and Astronautics, Li Jiang Shanghai Jiaotong University, Haibing Guan Shanghai Jiao Tong University DOI
16:10 20m Talk		Laser: Unlocking Layer-Level Scheduling for Efficient Multi-SLO LLM Serving Main Conference Jianxiong Liao Sun Yat-sen University, Quanxing Dong Sun Yat-sen University, Yunkai Liang Sun Yat-sen University, Zhi Zhou Sun Yat-sen University, Xu Chen Sun Yat-sen University DOI
16:30 20m Talk		MixFusion: A Patch-Level Parallel Serving System for Mixed-Resolution Diffusion Models Main Conference Desen Sun University of Waterloo, Zepeng Zhao Carnegie Mellon University, Yuke Wang Rice University DOI
16:50 20m Talk		ChituDiffusion: A Data-Characteristic-Aware Serving System for Diffusion Models Main Conference Chengzhang Wu Tsinghua University, Liyan Zheng Tsinghua University, Haojie Wang Tsinghua University, Kezhao Huang Tsinghua University, Zixuan Ma Tsinghua University, Dong Dong , Jidong Zhai Tsinghua University DOI