Scaling GPU-to-CPU Migration for Efficient Distributed Execution on CPU Clusters
The growing demand for GPU resources has led to widespread shortages in data centers, prompting the exploration of CPUs as an alternative for executing GPU programs. While prior research supports executing GPU programs on single CPUs, these approaches struggle to achieve competitive performance due to the computational capacity gap between GPUs and CPUs.
To further improve performance, we introduce CuCC, a framework that scales GPU-to-CPU migration to CPU clusters and utilizes distributed CPU nodes to execute GPU programs. Compared to single-CPU execution, CPU cluster execution requires cross-node communication to maintain data consistency. We present the CuCC execution workflow and communication optimizations, which aim to reduce network overhead. Evaluations demonstrate that CuCC achieves high scalability on large-scale CPU clusters and delivers runtimes approaching those of GPUs. In terms of cluster-wide throughput, CuCC enables CPUs to achieve an average of 2.59x higher throughput than GPUs.
Tue 3 FebDisplayed time zone: Hobart change
11:30 - 12:50 | Cluster and Cloud ComputingMain Conference at Pyrmont Chair(s): Ruslan Nikolaev Pennsylvania State University | ||
11:30 20mTalk | Cacheman: A Comprehensive Last-Level Cache Management System for Multi-tenant Clouds Main Conference Xiaokang Hu Alibaba Cloud Computing, Yuchao Cao Alibaba Cloud Computing, Naixuan Guan Alibaba Cloud Computing, Yifan Wu Alibaba Cloud Computing, Xishi Qiu Alibaba Cloud Computing, Shengdong Dai Alibaba Cloud Computing, Ben Luo Alibaba Cloud Computing, Sanchuan Cheng Alibaba Cloud Computing, Fudong Qiu Alibaba Cloud Computing, Yibin Shen Alibaba Cloud, Jiesheng Wu Alibaba Cloud Computing DOI | ||
11:50 20mTalk | zBuffer: Zero-Copy and Metadata-Free Serialization for Fast RPC with Scatter-Gather Reflection Main Conference Xiangyu Liu Xiamen University, Huiba Li Alibaba, Shun Gai Alibaba, Youmin Chen Shanghai Jiao Tong University, Yiming Zhang Xiamen University DOI | ||
12:10 20mTalk | Scaling GPU-to-CPU Migration for Efficient Distributed Execution on CPU Clusters Main Conference DOI | ||
12:30 20mTalk | Trojan Horse: Aggregate-and-Batch for Scaling Up Sparse Direct Solvers on GPU ClustersBest Paper Nominee Main Conference Yida Li China University of Petroleum-Beijing, Siwei Zhang China University of Petroleum-Beijing, Yiduo Niu China University of Petroleum-Beijing, Yang Du China University of Petroleum-Beijing, Qingxiao Sun China University of Petroleum-Beijing, Zhou Jin China University of Petroleum-Beijing, Weifeng Liu China University of Petroleum-Beijing DOI | ||