PPoPP 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026
Tue 3 Feb 2026 12:30 - 12:50 at Pyrmont - Cluster and Cloud Computing Chair(s): Ruslan Nikolaev

Sparse direct solvers are critical building blocks in a range of scientific applications on heterogeneous supercomputers. However, existing sparse direct solvers have not been able to well leverage the high bandwidth and floating-point performance of modern GPUs. The primary challenges are twofold: (1) the absence of a mechanism for aggregating small tasks to saturate the GPU, and (2) the lack of a mechanism for executing a diverse set of small tasks in batch mode on a single GPU.

We in this paper propose a strategy called Trojan Horse, which significantly enhances the execution efficiency of sparse direct solvers on GPU clusters. This mechanism divides each process's work into two stages: Aggregate (with two modules Prioritizer and Container) and Batch (with two modules Collector and Executor). In the Aggregate stage, a process first assesses the urgency of the input tasks through the Prioritizer module, and based on their priority, sends them to the Collector module or the Container module. In the batch stage, the Collector module receives high-priority heterogeneous tasks from the Prioritizer module and retrieves enough tasks from the Container module to send them to the Executor module for batch execution on GPU.In addition, our strategy is independent of solver libraries, and is integrated into SuperLU_DIST and PanguLU.

In the scale-up evaluation on a single NVIDIA A100 GPU, the Trojan Horse strategy delivers speedups of up to 418.79x (5.47x on average) for SuperLU_DIST and up to 5.59x (2.84x on average) for PanguLU. In the scale-out evaluation on two 16-GPU clusters from NVIDIA and AMD, respectively, Trojan Horse continues to deliver strong performance gains for both SuperLU_DIST and PanguLU across different GPU counts.

Tue 3 Feb

Displayed time zone: Hobart change

11:30 - 12:50
Cluster and Cloud ComputingMain Conference at Pyrmont
Chair(s): Ruslan Nikolaev Pennsylvania State University
11:30
20m
Talk
Cacheman: A Comprehensive Last-Level Cache Management System for Multi-tenant Clouds
Main Conference
Xiaokang Hu Alibaba Cloud Computing, Yuchao Cao Alibaba Cloud Computing, Naixuan Guan Alibaba Cloud Computing, Yifan Wu Alibaba Cloud Computing, Xishi Qiu Alibaba Cloud Computing, Shengdong Dai Alibaba Cloud Computing, Ben Luo Alibaba Cloud Computing, Sanchuan Cheng Alibaba Cloud Computing, Fudong Qiu Alibaba Cloud Computing, Yibin Shen Alibaba Cloud, Jiesheng Wu Alibaba Cloud Computing
DOI
11:50
20m
Talk
zBuffer: Zero-Copy and Metadata-Free Serialization for Fast RPC with Scatter-Gather Reflection
Main Conference
Xiangyu Liu Xiamen University, Huiba Li Alibaba, Shun Gai Alibaba, Youmin Chen Shanghai Jiao Tong University, Yiming Zhang Xiamen University
DOI
12:10
20m
Talk
Scaling GPU-to-CPU Migration for Efficient Distributed Execution on CPU Clusters
Main Conference
Ruobing Han Georgia Institute of Technology, Hyesoon Kim Georgia Institute of Technology
DOI
12:30
20m
Talk
Trojan Horse: Aggregate-and-Batch for Scaling Up Sparse Direct Solvers on GPU ClustersBest Paper Nominee
Main Conference
Yida Li China University of Petroleum-Beijing, Siwei Zhang China University of Petroleum-Beijing, Yiduo Niu China University of Petroleum-Beijing, Yang Du China University of Petroleum-Beijing, Qingxiao Sun China University of Petroleum-Beijing, Zhou Jin China University of Petroleum-Beijing, Weifeng Liu China University of Petroleum-Beijing
DOI