BEEMS: Boosting Machine Vision Efficiency via Computation Graph-Based Memory Smoothing
\newcommand{\archname}{BEEMS}
With the rapid advances of deep learning-based computer vision (CV) technology, digital images are increasingly processed not by humans, but by downstream CV algorithms. In particular, the growing popularity of vision foundation models has heightened interest in deploying these models on edge devices. However, limited memory remains a key bottleneck, making memory footprint reduction essential. Mainstream model customization methods often require intensive deployment efforts and can severely degrade accuracy. Moreover, existing deep learning frameworks generally do not prioritize memory optimization. Existing memory management schemes face practical limitations, including layer-wise memory imbalance, high management overhead, and volatile memory budgets.
To tackle these issues, this work focuses on compilation-level optimizations that are explicitly designed to be memory-aware. We observe that memory usage during vision foundation model inference varies significantly over time (up to a $10\times$ difference), with extended periods of low memory demand. Based on this, we propose \archname{}, a dual-objective compiler that optimizes both memory and latency by smoothing memory usage across the computational graph. \archname{} analyzes the vision foundation model computational graph to identify peak and trough operators in terms of memory demand. It then builds an efficient optimization search space, offering a flexible interface that applies different strategies based on operator characteristics. Specifically, peak operators are optimized using techniques such as operator partitioning, kernel substitution, swapping, and rematerialization to reduce memory pressure, while trough operators apply subgraph substitutions to improve latency. Experiments on six diverse models show that \archname{} reduces peak memory by up to 90% and improves latency by 10%, demonstrating its effectiveness in jointly optimizing memory and performance.
Tue 3 FebDisplayed time zone: Hobart change
15:50 - 17:10 | |||
15:50 20mTalk | BEEMS: Boosting Machine Vision Efficiency via Computation Graph-Based Memory Smoothing Main Conference Hanjing Shen Shanghai Jiao Tong University, Fangxin Liu Shanghai Jiao Tong University, Jian Liu Beijing University of Aeronautics and Astronautics, Li Jiang Shanghai Jiaotong University, Haibing Guan Shanghai Jiao Tong University DOI | ||
16:10 20mTalk | Laser: Unlocking Layer-Level Scheduling for Efficient Multi-SLO LLM Serving Main Conference Jianxiong Liao Sun Yat-sen University, Quanxing Dong Sun Yat-sen University, Yunkai Liang Sun Yat-sen University, Zhi Zhou Sun Yat-sen University, Xu Chen Sun Yat-sen University DOI | ||
16:30 20mTalk | MixFusion: A Patch-Level Parallel Serving System for Mixed-Resolution Diffusion Models Main Conference DOI | ||
16:50 20mTalk | ChituDiffusion: A Data-Characteristic-Aware Serving System for Diffusion Models Main Conference Chengzhang Wu Tsinghua University, Liyan Zheng Tsinghua University, Haojie Wang Tsinghua University, Kezhao Huang Tsinghua University, Zixuan Ma Tsinghua University, Dong Dong , Jidong Zhai Tsinghua University DOI | ||