PPoPP 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026
Dates
Plenary
You're viewing the program in a time zone which is different from your device's time zone change time zone

Mon 2 Feb

Displayed time zone: Hobart change

09:50 - 11:10
Concurrency ControlMain Conference at Pyrmont
Chair(s): Madan Musuvathi Microsoft Research
09:50
20m
Talk
Binary Compatible Critical Section DelegationBest Paper Award
Main Conference
Junyao Zhang , Zhuo Wang Alibaba Group, Zhe Zhou Fudan University
DOI
10:10
20m
Talk
Hapax Locks: Scalable Value-Based Mutual Exclusion
Main Conference
Dave Dice Independent, Alex Kogan Oracle Labs
DOI
10:30
20m
Talk
Fixing Non-blocking Data Structures for Better Compatibility with Memory Reclamation Schemes
Main Conference
Md Amit Hasan Arovi Pennsylvania State University, Ruslan Nikolaev Pennsylvania State University
DOI
10:50
20m
Talk
Multiverse: Transactional Memory with Dynamic Multiversioning
Main Conference
Gaetano Coccimiglio University of Waterloo, Trevor Brown University of Waterloo, Srivatsan Ravi University of Southern California
DOI
11:10 - 11:30
11:10
20m
Coffee break
Break
HPCA/CGO/PPoPP/CC Catering

11:30 - 12:50
Scheduling and Load BalancingMain Conference at Pyrmont
Chair(s): V Krishna Nandivada IIT Madras
11:30
20m
Talk
Rethinking Thread Scheduling under Oversubscription: A User-Space Framework for Coordinating Multi-runtime and Multi-process WorkloadsBest Paper Nominee
Main Conference
Aleix Roca Barcelona Supercomputing Center, Vicenç Beltran Barcelona Supercomputing Center
DOI
11:50
20m
Talk
Waste-Efficient Work Stealing
Main Conference
Kyle Singer Massachusetts Institute of Technology, Kunal Agrawal Washington University in St. Louis, TB Schardl Massachusetts Institute of Technology
DOI
12:10
20m
Talk
DiggerBees: Depth First Search Leveraging Hierarchical Block-Level Stealing on GPUs
Main Conference
Yuyao Niu Barcelona Supercomputing Center, Yuechen Lu China University of Petroleum-Beijing, Weifeng Liu China University of Petroleum-Beijing, Marc Casas Barcelona Supercomputing Center
DOI
12:30
20m
Talk
PANA: A Fine-Grained Runtime-Adaptive Load Balancing for Parallel SpMV on Multicore CPUs
Main Conference
Haodong Bian Tsinghua University, Youhui Zhang Tsinghua University, Xiang Fei Tsinghua University, Jianqiang Huang Qinghai University, Xiaoying Wang Qinghai University
DOI
12:50 - 14:10
12:50
80m
Lunch
Lunch
HPCA/CGO/PPoPP/CC Catering

14:10 - 15:30
Concurrent Data StructuresMain Conference at Pyrmont
Chair(s): Calin Cascaval Google DeepMind
14:10
20m
Talk
UFO Trees: Practical and Provably-Efficient Parallel Batch-Dynamic TreesBest Paper Nominee
Main Conference
Quinten De Man University of Maryland, Atharva Sharma University of Maryland, Kishen N Gowda University of Maryland, Laxman Dhulipala University of Maryland, College Park
DOI
14:30
20m
Talk
Sharded Elimination and Combining for Highly-Efficient Concurrent Stacks
Main Conference
Ajay Singh FORTH ICS, Nikos Metaxakis , Panagiota Fatourou FORTH ICS and University of Crete, Greece
DOI
14:50
20m
Talk
Concurrent Balanced Augmented Trees
Main Conference
Evan Wrench University of British Columbia, Ajay Singh FORTH ICS, Younghun Roh Massachusetts Institute of Technology, Panagiota Fatourou University of Crete & FORTH, Siddhartha Jayanti Google Research, Eric Ruppert York University, Yuanhao Wei University of British Columbia
DOI
15:10
20m
Talk
Parallel Dynamic Spatial Indexes
Main Conference
Ziyang Men University of California, Riverside, Bo Huang University of California, Riverside, Yan Gu University of California, Riverside, Yihan Sun University of California, Riverside
DOI
15:30 - 15:50
15:30
20m
Coffee break
Break
HPCA/CGO/PPoPP/CC Catering

15:50 - 17:10
GPU and Heterogeneous ComputingMain Conference at Pyrmont
Chair(s): Frank Mueller North Carolina State University, USA
15:50
20m
Talk
PRISM: An Efficient GPU-Based Lossy Compression Framework for Progressive Data Retrieval with Multi-Level InterpolationBest Paper Nominee
Main Conference
Bing Lu Institute of Computing Technology of Chinese Academy of Sciences, Zedong Liu University of Chinese Academy of Sciences, Hairui Zhao Jilin University, Dejun Luo University of Chinese Academy of Sciences, Wenjing Huang University of Chinese Academy of Sciences, Yida Gu University of Chinese Academy of Sciences, Jinyang Liu University of Houston, Guangming Tan University of Chinese Academy of Sciences, Dingwen Tao Institute of Computing Technology, Chinese Academy of Sciences
DOI
16:10
20m
Talk
Dynamic Detection of Inefficient Data Mapping Patterns in Heterogeneous OpenMP Applications
Main Conference
Luke Marzen Iowa State University, Junhyung Shim Iowa State University, Ali Jannesari Iowa State University
DOI
16:30
20m
Talk
Root-Down Exposure for Maximal Clique Enumeration on GPUs
Main Conference
Zhe Pan Tsinghua University, Peng Qu Tsinghua University, Youhui Zhang Tsinghua University
DOI
16:50
20m
Talk
ROME: Maximizing GPU Efficiency for All-Pairs Shortest Path via Taming Fine-Grained Irregularities
Main Conference
Weile Luo The Hong Kong University of Science and Technology, Guangzhou, Yuhan Chen The Hong Kong University of Science and Technology, Guangzhou, Xiangrui Yu The Hong Kong University of Science and Technology, Guangzhou, Qiang Wang Harbin Institute of Technology, Shenzhen, Ruibo Fan The Hong Kong University of Science and Technology, Guangzhou, Hongyuan Liu Stevens Institute of Technology, Xiaowen Chu The Hong Kong University of Science and Technology, Guangzhou
DOI
17:30 - 19:00
Business MeetingMain Conference at Cronulla
Chair(s): Tony Hosking Australian National University, Madan Musuvathi Microsoft Research, Kenjiro Taura The University of Tokyo
17:30
90m
Meeting
PPoPP Business Meeting
Main Conference

Tue 3 Feb

Displayed time zone: Hobart change

09:50 - 11:10
Stencil and Sparse Matrix ComputationMain Conference at Pyrmont
Chair(s): Shoaib Kamil Adobe Research
09:50
20m
Talk
SPIDER: Unleashing Sparse Tensor Cores for Stencil Computation via Strided Swapping
Main Conference
Qiqi Gu Shanghai Jiao Tong University, Chenpeng Wu Shanghai Jiao Tong University, Heng Shi , Jianguo Yao Shanghai Jiao Tong University; Shanghai Enflame Technology
DOI
10:10
20m
Talk
ASM-SpMM: Unleashing the Potential of Arm SME for Sparse Matrix Multiplication Acceleration
Main Conference
Jiazhi Jiang Sun Yat-sen University, Xijia Yao Sun Yat-sen University, Jiayu Chen Sun Yat-sen University, jinhui wei Sun Yat-sen University, Dan Huang , Yutong Lu Sun Yat-sen University
DOI
10:30
20m
Talk
Exploiting Efficient Mapping and Pipelined Execution for Accelerating SpMV on Tensor Cores
Main Conference
Kaige Zhang Beihang University, Hailong Yang Beihang University, Xin You Beihang University, Tianyu Feng Beihang University, Yufan Xu Independent Researcher, Zhongzhi Luan Beihang University, Yi Liu Beihang University, Depei Qian Beihang University
DOI
10:50
20m
Talk
VDHA: Vector-Driven Hash Aggregation for Sparse Matrix-Sparse Vector Multiplication on GPUs
Main Conference
Yuchen Li Tsinghua University, Zhe Pan Tsinghua University, Peng Qu Tsinghua University, Youhui Zhang Tsinghua University
DOI
11:10 - 11:30
11:10
20m
Coffee break
Break
HPCA/CGO/PPoPP/CC Catering

11:30 - 12:50
Mixed Precision and QuantizationMain Conference at Balmoral
Chair(s): Dingwen Tao Institute of Computing Technology, Chinese Academy of Sciences
11:30
20m
Talk
RoMeo: Mitigating Dual-dimensional Outliers with Rotated Mixed Precision Quantization
Main Conference
Qihao Zhang Tsinghua University, MingLiang Tang Tsinghua University, Mingshu Zhai Tsinghua University, Kinman Lei Tsinghua University, Jidong Zhai Tsinghua University
DOI
11:50
20m
Talk
High-Throughput Non-Uniformly Quantized 3-bit LLM Inference
Main Conference
YuAng Chen Chinese University of Hong Kong, Wenqi Zeng Hong Kong University of Science and Technology, Jeffrey Xu Yu Chinese University of Hong Kong
DOI
12:10
20m
Talk
JanusQuant: Accurate and Efficient 2-bit KV Cache Quantization for Long-Context Inference
Main Conference
Chengyu Sun Wuhan University, Yaqi Xia Wuhan University, Hulin Wang , Donglin Yang Nvidia Corporation, Xiaobo Zhou University of Macau, Dazhao Cheng WuHan University
DOI
12:30
20m
Talk
HierCut: Enabling 16-bit Format Mixed Precision for Molecular Dynamics through Hierarchical CutoffBest Artifact Award
Main Conference
zeyu song Tsinghua University, Lin Gan Tsinghua University, Xiaohui Duan Shandong University, Jiayu Fu Tsinghua University, Zhengrui Li Tsinghua University, Yinuo Wang Tsinghua University, Guangzhao Li Chinese Academy of Sciences, Guangwen Yang Tsinghua University
DOI
11:30 - 12:50
Cluster and Cloud ComputingMain Conference at Pyrmont
Chair(s): Ruslan Nikolaev Pennsylvania State University
11:30
20m
Talk
Cacheman: A Comprehensive Last-Level Cache Management System for Multi-tenant Clouds
Main Conference
Xiaokang Hu Alibaba Cloud Computing, Yuchao Cao Alibaba Cloud Computing, Naixuan Guan Alibaba Cloud Computing, Yifan Wu Alibaba Cloud Computing, Xishi Qiu Alibaba Cloud Computing, Shengdong Dai Alibaba Cloud Computing, Ben Luo Alibaba Cloud Computing, Sanchuan Cheng Alibaba Cloud Computing, Fudong Qiu Alibaba Cloud Computing, Yibin Shen Alibaba Cloud, Jiesheng Wu Alibaba Cloud Computing
DOI
11:50
20m
Talk
zBuffer: Zero-Copy and Metadata-Free Serialization for Fast RPC with Scatter-Gather Reflection
Main Conference
Xiangyu Liu Xiamen University, Huiba Li Alibaba, Shun Gai Alibaba, Youmin Chen Shanghai Jiao Tong University, Yiming Zhang Xiamen University
DOI
12:10
20m
Talk
Scaling GPU-to-CPU Migration for Efficient Distributed Execution on CPU Clusters
Main Conference
Ruobing Han Georgia Institute of Technology, Hyesoon Kim Georgia Institute of Technology
DOI
12:30
20m
Talk
Trojan Horse: Aggregate-and-Batch for Scaling Up Sparse Direct Solvers on GPU ClustersBest Paper Nominee
Main Conference
Yida Li China University of Petroleum-Beijing, Siwei Zhang China University of Petroleum-Beijing, Yiduo Niu China University of Petroleum-Beijing, Yang Du China University of Petroleum-Beijing, Qingxiao Sun China University of Petroleum-Beijing, Zhou Jin China University of Petroleum-Beijing, Weifeng Liu China University of Petroleum-Beijing
DOI
12:50 - 14:10
12:50
80m
Awards
HPCA Awards Lunch
HPCA/CGO/PPoPP/CC Catering

12:50 - 14:10
12:50
80m
Lunch
Lunch
HPCA/CGO/PPoPP/CC Catering

14:10 - 15:30
Distributed TrainingMain Conference at Balmoral
Chair(s): Bo Fang University of Texas at Arlington
14:10
20m
Talk
COCCL: A Collective Communication Library Supporting Easy Integration and Configuration of Customized Compression for Scalable LLM Training
Main Conference
Xingchen Liu University of Chinese Academy of Sciences, Haoran Kong Chinese University of Hong Kong, Shenzhen, Hairui Zhao Jilin University, Shengkai Lyu University of Chinese Academy of Sciences, Zheng Wei University of Chinese Academy of Sciences, Man Liu University of Chinese Academy of Sciences, Xingjian Tian University of Chinese Academy of Sciences, Liyang Zhao University of Chinese Academy of Sciences, Zhuohan Chen University of Chinese Academy of Sciences, Fakang Wang Ant Group, Zizhong Chen Chinese University of Hong Kong, Shenzhen, Zhan Wang University of Chinese Academy of Sciences, Guangming Tan University of Chinese Academy of Sciences, Dingwen Tao Institute of Computing Technology, Chinese Academy of Sciences
DOI
14:30
20m
Talk
Elastor: Elastic and Efficient Model Partitioning and Checkpointing for Fault-Tolerant Distributed Training
Main Conference
Xuanyu Wang Peking University, Fangcheng FU Shanghai Jiao Tong University, Haoyang Li Peking University, Hao Ge Peking University, Sheng Lin Peking University, Jiawen Niu Peking University, Bin Cui Peking University
DOI
14:50
20m
Talk
HelixPipe: Efficient Distributed Training of Long Sequence Transformers with Attention Parallel Pipeline Parallelism
Main Conference
Geng Zhang National University of Singapore, Shenggan Cheng National University of Singapore, Xuanlei Zhao National University of Singapore, Ziming Liu , Yang You National University of Singapore
DOI
15:10
20m
Talk
CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model TrainingBest Paper Nominee
Main Conference
Yida Gu University of Chinese Academy of Sciences, Fakang Wang AntGroup, Jianhao Fu AntGroup, Zhenhang Sun Ant Group, Qianyu Zhang Ant Group, Hairui Zhao Jilin University, Xingchen Liu University of Chinese Academy of Sciences, Yang Tian Ant Group, Wenjing Huang University of Chinese Academy of Sciences, Zedong Liu University of Chinese Academy of Sciences, Yifan Chen Ant Group, Jinwu Yang University of Chinese Academy of Sciences, Yueyuan Zhou University of Chinese Academy of Sciences, Qian Zhao Ant Group, Haoxu Li University of Chinese Academy of Sciences, Tao Wang Ant Group, Feng Yu Ant Group, Zhan Wang University of Chinese Academy of Sciences, Guangming Tan University of Chinese Academy of Sciences, Dingwen Tao Institute of Computing Technology, Chinese Academy of Sciences
DOI
14:10 - 15:30
Parallel AlgorithmsMain Conference at Pyrmont
Chair(s): Kenjiro Taura The University of Tokyo
14:10
20m
Talk
Pipelonk: Accelerating End-to-End Zero-Knowledge Proof Generation on GPUs for PLONK-Based Protocols
Main Conference
Zhiyuan Zhang Shandong University, Yanxin Cai Shandong University, Wenhao Yin Shandong University, Xueyu Wu The University of Hong Kong, Yi Wang Shenzhen University, Lei Ju Shandong University, Zhuoran Ji Shandong University
DOI
14:30
20m
Talk
ParDiff: Efficiently Parallelizing Reverse-Mode Automatic Differentiation with Direct Indexing
Main Conference
Shuhong Huang Tsinghua University, Shizhi Tang Qingcheng.AI, Yuan Wen University of Aberdeen, Huanqi Cao Tsinghua University, Ruibai Tang Tsinghua University, yidong chen , Jiping Yu Tsinghua University, Yang Li Lenovo Research, Chao Jiang Lenovo Research, Limin Xiao Lenovo Research, Jidong Zhai Tsinghua University
DOI
14:50
20m
Talk
Faster and Cheaper: Pushing the Sequence Alignment Throughput with Commercial CPUs
Main Conference
Zhonghai Zhang Institute of Computing Technology, Chinese Academy of Sciences / University of Chinese Academy of Sciences, Yewen Li The Hong Kong University of Science and Technology, Ke Meng Chinese Academy of Sciences, Chunming Zhang Institute of Computing Technology, Chinese Academy of Sciences, Guangming Tan University of Chinese Academy of Sciences
DOI
15:10
20m
Talk
PIM-zd-tree: A Fast Space-Partitioning Index Leveraging Processing-in-Memory
Main Conference
Yiwei Zhao Carnegie Mellon University, Hongbo Kang Tsinghua University, Ziyang Men University of California, Riverside, Yan Gu University of California, Riverside, Guy E. Blelloch Carnegie Mellon University, Laxman Dhulipala University of Maryland, College Park, Charles McGuffey Reed College, Phil Gibbons Carnegie Mellon University
DOI
15:30 - 15:50
15:30
20m
Coffee break
Break
HPCA/CGO/PPoPP/CC Catering

15:50 - 17:10
ML InferenceMain Conference at Balmoral
Chair(s): Hailong Yang Beihang University
15:50
20m
Talk
BEEMS: Boosting Machine Vision Efficiency via Computation Graph-Based Memory Smoothing
Main Conference
Hanjing Shen Shanghai Jiao Tong University, Fangxin Liu Shanghai Jiao Tong University, Jian Liu Beijing University of Aeronautics and Astronautics, Li Jiang Shanghai Jiaotong University, Haibing Guan Shanghai Jiao Tong University
DOI
16:10
20m
Talk
Laser: Unlocking Layer-Level Scheduling for Efficient Multi-SLO LLM Serving
Main Conference
Jianxiong Liao Sun Yat-sen University, ​​Quanxing​ Dong​ Sun Yat-sen University​, Yunkai Liang Sun Yat-sen University, Zhi Zhou Sun Yat-sen University, Xu Chen Sun Yat-sen University
DOI
16:30
20m
Talk
MixFusion: A Patch-Level Parallel Serving System for Mixed-Resolution Diffusion Models
Main Conference
Desen Sun University of Waterloo, Zepeng Zhao Carnegie Mellon University, Yuke Wang Rice University
DOI
16:50
20m
Talk
ChituDiffusion: A Data-Characteristic-Aware Serving System for Diffusion Models
Main Conference
Chengzhang Wu Tsinghua University, Liyan Zheng Tsinghua University, Haojie Wang Tsinghua University, Kezhao Huang Tsinghua University, Zixuan Ma Tsinghua University, Dong Dong , Jidong Zhai Tsinghua University
DOI
15:50 - 17:10
Graphs and Graph Neural NetworksMain Conference at Pyrmont
Chair(s): Ali Jannesari Iowa State University
15:50
20m
Talk
ElasGNN: An Elastic Training Framework for Distributed GNN Training
Main Conference
Siqi Wang Beihang University, Hailong Yang Beihang University, Pengbo Wang Beihang University, Hongliang Cao Beihang University, Yufan Xu Independent Researcher, Xuezhu Wang Beihang University, Zhongzhi Luan Beihang University, Yi Liu Beihang University, Depei Qian Beihang University
DOI
16:10
20m
Talk
APERTURE: Algorithm-System Co-optimization for Temporal Graph Network Inference
Main Conference
Yiqing Wang Beihang University, Hailong Yang Beihang University, Enze Yu Beihang University, Qingxiao Sun Beihang University, Kejie Ma Beihang University, Kaige Zhang Beihang University, chenhao xie Beihang University, Depei Qian Beihang University
DOI
16:30
20m
Talk
TAC: Cache-Based System for Accelerating Billion-Scale GNN Training on Multi-GPU Platform
Main Conference
Zhiqiang Liang , Hongyu Gao​​ , Fang Liu Computer Network Information Center, Chinese Academy of Sciences,University of Chinese Academy of Sciences, Jue Wang Computer Network Information Center, Chinese Academy of Sciences;University of Chinese Academy of Sciences, Xingguo Shi University of Chinese Academy of Sciences, Juyu Gu University of Chinese Academy of Sciences, Peng Di Ant Group & UNSW, San Li University of Chinese Academy of Sciences, Lei Tang University of Chinese Academy of Sciences, Chunbao Zhou University of Chinese Academy of Sciences, Lian Zhao University of Chinese Academy of Sciences, yangang wang University of Chinese Academy of Sciences, Xuebin Chi University of Chinese Academy of Sciences
DOI
16:50
20m
Talk
DTMiner: A Data-Centric System for Efficient Temporal Motif Mining
Main Conference
hou yinbo Huazhong University of Science and Technology, Hao Qi Huazhong University of Science and Technology, Ligang He University of Warwick, Jin Zhao Huazhong University of Science and Technology, Yu Zhang School of Computer Science and Technology, Huazhong University of Science and Technology, Hui Yu Hong Kong University of Science and Technology, Longlong Lin Southwest University, Lin Gu Huazhong University of Science and Technology, Wenbin Jiang Huazhong University of Science and Technology, XIAOFEI LIAO Huazhong University of Science and Technology, Hai Jin Huazhong University of Science and Technology
DOI
17:15 - 18:15
Optimizing TransformersMain Conference at Pyrmont
Chair(s): Shaoshuai Zhang University of Electronic Science and Technology of China
17:15
20m
Talk
FlashAttention-T: Towards Fully Tensorized Attention by Exploiting Tensor-Vector Parallelism
Main Conference
Jianxing Xu University of Science and Technology of China, Yuanbo Wen , Jun Bi Chinese Academy of Sciences, Ruibai Xu University of Science and Technology of China, Guanglin Xu Chinese Academy of Sciences, Rui Zhang Chinese Academy of Sciences, Wei Li Chinese Academy of Sciences, Ling Li Institute of Software, Chinese Academy of Sciences, Tianshi Chen Cambricon Technologies, Qi Guo Chinese Academy of Sciences, Yunji Chen Chinese Academy of Sciences
DOI
17:35
20m
Talk
Accelerating Sparse Transformer Inference on GPU
Main Conference
Wenhao Dai China University of Petroleum-Beijing, Haodong Deng China University of Petroleum, Mengfei Rong China University of Petroleum, Xinyu Yang Beihang University, Hongyu Liu Baidu Inc., Fangxin Liu Shanghai Jiao Tong University, Hailong Yang Beihang University, Qianwen Cao China University of Petroleum, Qingxiao Sun Beihang University
DOI
17:55
20m
Talk
MetaAttention: A Unified and Performant Attention Framework Across Hardware Backends
Main Conference
Feiyang Chen Shanghai Jiao Tong University, Yu Cheng Peking University, Lei Wang Peking University, Yuqing Xia Microsoft Research, Ziming Miao Microsoft Research, Lingxiao Ma Microsoft Research, Fan Yang Microsoft Research Asia, Jilong Xue Microsoft Research, Zhi Yang Peking University, Mao Yang Microsoft Research, Xingda Wei Shanghai Jiao Tong University, Haibo Chen Shanghai Jiao Tong University
DOI
18:30 - 21:30
18:30
3h
Social Event
Excursion
HPCA/CGO/PPoPP/CC Catering

Wed 4 Feb

Displayed time zone: Hobart change

09:50 - 11:10
Matrix and Linear Algebra AlgorithmsMain Conference at Pyrmont
Chair(s): William S. Moses University of Illinois Urbana-Champaign
09:50
20m
Talk
Towards Singular Value Decomposition for Rank-Deficient Matrices: An Efficient and Accurate Algorithm on GPU Architectures
Main Conference
Lu Shi University of Electronic Science and Technology of China, WeiWei Xu Nanjing University of Information Science and Technology, Shaoshuai Zhang University of Electronic Science and Technology of China
DOI
10:10
20m
Talk
A Diagonal Block Memory-Aware Polynomial Preconditioner for Linear and Eigenvalue Solvers
Main Conference
Xiaojian Yang National University of Defense Technology, Yuhui Ni National University of Defense Technology, Fan Yuan Xiangtan University, Shengguo Li National University of Defense Technology, Dezun Dong NUDT, xuchuanfu National University of Defense Technology, Haipeng Jia Jia, Jie Liu National University of Defense Technology
DOI
10:30
20m
Talk
A Distributed Matrix-Block-Vector Multiplication in Presence of System Performance Variability
Main Conference
Yuchen Ma College of William & Mary, Bin Ren College of William & Mary, Andreas Stathopoulos College of William & Mary
DOI
10:50
20m
Talk
Characterizing Matrix Multiplication Units across General Parallel Patterns in Scientific Computing
Main Conference
Yuechen Lu China University of Petroleum-Beijing, Hongwei Zeng , Marc Casas Barcelona Supercomputing Center, Weifeng Liu China University of Petroleum-Beijing
DOI
11:10 - 11:30
11:10
20m
Coffee break
Break
HPCA/CGO/PPoPP/CC Catering

Accepted Papers

Title
Accelerating Sparse Transformer Inference on GPU
Main Conference
DOI
A Diagonal Block Memory-Aware Polynomial Preconditioner for Linear and Eigenvalue Solvers
Main Conference
DOI
A Distributed Matrix-Block-Vector Multiplication in Presence of System Performance Variability
Main Conference
DOI
APERTURE: Algorithm-System Co-optimization for Temporal Graph Network Inference
Main Conference
DOI
ASM-SpMM: Unleashing the Potential of Arm SME for Sparse Matrix Multiplication Acceleration
Main Conference
DOI
BEEMS: Boosting Machine Vision Efficiency via Computation Graph-Based Memory Smoothing
Main Conference
DOI
Binary Compatible Critical Section DelegationBest Paper Award
Main Conference
DOI
Cacheman: A Comprehensive Last-Level Cache Management System for Multi-tenant Clouds
Main Conference
DOI
CCL-D: A High-Precision Diagnostic System for Slow and Hang Anomalies in Large-Scale Model TrainingBest Paper Nominee
Main Conference
DOI
Characterizing Matrix Multiplication Units across General Parallel Patterns in Scientific Computing
Main Conference
DOI
ChituDiffusion: A Data-Characteristic-Aware Serving System for Diffusion Models
Main Conference
DOI
COCCL: A Collective Communication Library Supporting Easy Integration and Configuration of Customized Compression for Scalable LLM Training
Main Conference
DOI
Concurrent Balanced Augmented Trees
Main Conference
DOI
DiggerBees: Depth First Search Leveraging Hierarchical Block-Level Stealing on GPUs
Main Conference
DOI
DTMiner: A Data-Centric System for Efficient Temporal Motif Mining
Main Conference
DOI
Dynamic Detection of Inefficient Data Mapping Patterns in Heterogeneous OpenMP Applications
Main Conference
DOI
ElasGNN: An Elastic Training Framework for Distributed GNN Training
Main Conference
DOI
Elastor: Elastic and Efficient Model Partitioning and Checkpointing for Fault-Tolerant Distributed Training
Main Conference
DOI
Exploiting Efficient Mapping and Pipelined Execution for Accelerating SpMV on Tensor Cores
Main Conference
DOI
Faster and Cheaper: Pushing the Sequence Alignment Throughput with Commercial CPUs
Main Conference
DOI
Fixing Non-blocking Data Structures for Better Compatibility with Memory Reclamation Schemes
Main Conference
DOI
FlashAttention-T: Towards Fully Tensorized Attention by Exploiting Tensor-Vector Parallelism
Main Conference
DOI
Hapax Locks: Scalable Value-Based Mutual Exclusion
Main Conference
DOI
HelixPipe: Efficient Distributed Training of Long Sequence Transformers with Attention Parallel Pipeline Parallelism
Main Conference
DOI
HierCut: Enabling 16-bit Format Mixed Precision for Molecular Dynamics through Hierarchical CutoffBest Artifact Award
Main Conference
DOI
High-Throughput Non-Uniformly Quantized 3-bit LLM Inference
Main Conference
DOI
JanusQuant: Accurate and Efficient 2-bit KV Cache Quantization for Long-Context Inference
Main Conference
DOI
Laser: Unlocking Layer-Level Scheduling for Efficient Multi-SLO LLM Serving
Main Conference
DOI
MetaAttention: A Unified and Performant Attention Framework Across Hardware Backends
Main Conference
DOI
MixFusion: A Patch-Level Parallel Serving System for Mixed-Resolution Diffusion Models
Main Conference
DOI
Multiverse: Transactional Memory with Dynamic Multiversioning
Main Conference
DOI
PANA: A Fine-Grained Runtime-Adaptive Load Balancing for Parallel SpMV on Multicore CPUs
Main Conference
DOI
Parallel Dynamic Spatial Indexes
Main Conference
DOI
ParDiff: Efficiently Parallelizing Reverse-Mode Automatic Differentiation with Direct Indexing
Main Conference
DOI
PIM-zd-tree: A Fast Space-Partitioning Index Leveraging Processing-in-Memory
Main Conference
DOI
Pipelonk: Accelerating End-to-End Zero-Knowledge Proof Generation on GPUs for PLONK-Based Protocols
Main Conference
DOI
PRISM: An Efficient GPU-Based Lossy Compression Framework for Progressive Data Retrieval with Multi-Level InterpolationBest Paper Nominee
Main Conference
DOI
Rethinking Thread Scheduling under Oversubscription: A User-Space Framework for Coordinating Multi-runtime and Multi-process WorkloadsBest Paper Nominee
Main Conference
DOI
ROME: Maximizing GPU Efficiency for All-Pairs Shortest Path via Taming Fine-Grained Irregularities
Main Conference
DOI
RoMeo: Mitigating Dual-dimensional Outliers with Rotated Mixed Precision Quantization
Main Conference
DOI
Root-Down Exposure for Maximal Clique Enumeration on GPUs
Main Conference
DOI
Scaling GPU-to-CPU Migration for Efficient Distributed Execution on CPU Clusters
Main Conference
DOI
Sharded Elimination and Combining for Highly-Efficient Concurrent Stacks
Main Conference
DOI
SPIDER: Unleashing Sparse Tensor Cores for Stencil Computation via Strided Swapping
Main Conference
DOI
TAC: Cache-Based System for Accelerating Billion-Scale GNN Training on Multi-GPU Platform
Main Conference
DOI
Towards Singular Value Decomposition for Rank-Deficient Matrices: An Efficient and Accurate Algorithm on GPU Architectures
Main Conference
DOI
Trojan Horse: Aggregate-and-Batch for Scaling Up Sparse Direct Solvers on GPU ClustersBest Paper Nominee
Main Conference
DOI
UFO Trees: Practical and Provably-Efficient Parallel Batch-Dynamic TreesBest Paper Nominee
Main Conference
DOI
VDHA: Vector-Driven Hash Aggregation for Sparse Matrix-Sparse Vector Multiplication on GPUs
Main Conference
DOI
Waste-Efficient Work Stealing
Main Conference
DOI
zBuffer: Zero-Copy and Metadata-Free Serialization for Fast RPC with Scatter-Gather Reflection
Main Conference
DOI

Call for Papers

31st ACM SIGPLAN Annual Symposium on Principles and Practice of Parallel Programming

Co-located with HPCA, CGO, and CC

Sydney, Australia

PPoPP is the premier forum for leading work on all aspects of parallel and performance programming, including theoretical foundations, techniques, languages, compilers, runtime systems, tools, applications, and practical experience. This symposium focuses on improving the programming productivity and performance engineering of all concurrent and parallel systems—multicore, multi-threaded, heterogeneous, clustered, and distributed systems, grids, accelerators such as ASICs, GPUs, FPGAs, data centers, clouds, large scale machines, and quantum computers. PPoPP is also interested in new and emerging parallel workloads and applications, such as artificial intelligence and large-scale scientific/enterprise workloads.

Important dates

  • Full paper submission: Monday, September 1st, 2025
  • Author response period: October 27 - 29, 2025 (Mon - Wed) November 2 - 4, 2025 (Sun - Tue)
  • Author notification: Monday, November 10th, 2025
  • Artifact submission to AE committee: Monday, November 17th, 2025 November 24th, 2025
  • Artifact notification by AE committee: Monday, January 5th, 2026
  • Final paper due: Friday, January 9, 2026 (TBC)

The submissions website is now live.

Scope

Specific topics of interest include (but are not limited to):

  • Languages, compilers, and runtime systems for parallel programs
  • Parallel programming frameworks and domain-specific languages
  • Parallel programming for emerging hardware, including AI accelerators, processor-in-memory, programmable logic, non-volatile memory technologies, and quantum computers
  • High-performance libraries
  • Parallel programming for deep memory hierarchies including nonvolatile memory
  • Parallel algorithms
  • Parallel applications including scientific computing and enterprise workloads
  • Artificial intelligence and machine learning for parallel systems, including their use in system design, optimization, and runtime decisions
  • Development, analysis, or management tools
  • Performance analysis, debugging and optimization
  • Productivity tools for parallel systems
  • Software engineering for parallel programs
  • Parallel programming theory and models
  • Formal analysis and verification
  • Concurrent data structures
  • Synchronization and concurrency control
  • Fault tolerance for parallel systems
  • Middleware for parallel systems

Papers should report on original research relevant to parallel programming and should contain enough background materials to make them accessible to the entire parallel programming research community. Papers describing experience should indicate how they illustrate general principles or lead to new insights; papers about parallel programming foundations should indicate how they relate to practice. PPoPP submissions will be evaluated based on their technical merit and accessibility. Submissions should clearly motivate the importance of the problem being addressed, compare to the existing body of work on the topic, and explicitly and precisely state the paper’s key contributions and results towards addressing the problem. Submissions should strive to be accessible both to a broad audience and to experts in the area.

Paper Submission

All submissions must be made electronically through the conference website and include an abstract (100–400 words), author contact information, the full list of authors and their affiliations. Full paper submissions must be in PDF format printable on both A4 and US letter-size paper.

All papers must be prepared in two-column ACM Conference Format, specifically the acmart document class (available here) with the sigplan option and 10-point font size. We recommend preparing your submission in Latex starting with the following template: https://ppopp26.sigplan.org/getImage/orig/ppopp-acmart-sigplanproc-template.tex. This template ensures that there are no line numbers and headers in the margins to avoid triggering HotCRP’s format checker. Refrain from squeezing additional space by tweaking the template above, e.g., by manipulating vertical space, reducing margins, line spacing, heading space, column separation, etc. Ensure that caption fonts are at least 9 pt and all fonts in figures and tables are at least 8 pt. Program chairs will inspect submissions using format checking tools and reserve the right to reject submissions that violate the formatting rules. If you would like to use Word or other Latex template, please ensure that your submission follows all the formatting guidelines. You may want to consult the official ACM information on the Master Article Template and related tools.

Papers should contain a maximum of 10 pages of text and figures, but NOT INCLUDING references. There is no page limit for references, and they must include the names of all authors (not et al.) and spell out the publication year and venue (and not simply point to the DOI link). Appendices are not allowed, but the authors may submit supplementary material, such as proofs or source code; all supplementary material must be in PDF or ZIP format. Looking at supplementary material is at the discretion of the reviewers.

Submission is double-blind, and authors will need to identify any potential conflicts of interest with PC and Extended Review Committee members, as defined by the ACM SIGPLAN review policy.

To facilitate this process, submissions should not reveal the identity of the authors in any way. Authors should leave out author names and affiliations from the body of their submission. They should also ensure that any references to authors’ own related work should be in the third person (e.g., not “We build on our previous work …” but rather “We build on the work of …”). The purpose of this process is to help the PC and external reviewers come to an initial judgment about the paper without bias, not to make it impossible for them to discover the authors if they were to try. Nothing should be done in the name of anonymity that weakens the submission or makes the job of reviewing the paper more difficult. In particular, important background references should not be omitted or anonymized. In addition, authors should feel free to disseminate their ideas or draft versions of their papers as they normally would. For instance, authors may post drafts of their papers on the web or give talks on their research ideas. Authors with further questions on double-blind reviewing are encouraged to contact the Program Chairs by email.

To facilitate fair and unbiased reviews for all submissions, PPoPP 2026 may utilize the Toronto Paper Matching System (TPMS) to assign papers to reviewers. From the authors’ perspective, this decision means that the submissions may be uploaded to TPMS.

Papers may be resubmitted to the submission site multiple times up until the deadline, but the last version submitted before the deadline will be the version reviewed. Papers that exceed the length requirement, which deviate from the expected format, or that are submitted late will be rejected.

All submissions that are not accepted for regular presentations will be automatically considered for posters. Two-page summaries of accepted posters will be included in the conference proceedings.

To allow reproducibility, we encourage authors of accepted papers to submit their papers for Artifact Evaluation (AE). The AE process begins after the acceptance notification and is run by a separate committee whose task is to assess how the artifacts support the work described in the papers. Artifact evaluation is voluntary and will not affect paper acceptance but will be taken into consideration when selecting papers for awards. Papers that go through the AE process successfully will receive at least one ACM reproducibility badge, printed on the published version. More information will be posted on the AE website.

Deadlines expire at midnight anywhere on Earth (AoE).

Publication Date

The titles of all accepted papers are typically announced shortly after the author notification date (late November 2025). Note, however, that this is not the official publication date. The official publication date is the date the proceedings are made available in the ACM Digital Library. ACM will make the proceedings available via the Digital Library, up to 2 weeks prior to the first day of the conference. The official publication date affects the deadline for any patent filings related to published work.

ACM Publications Policies

By submitting your article to an ACM Publication, you are hereby acknowledging that you and your co-authors are subject to all ACM Publications Policies, including ACM’s new Publications Policy on Research Involving Human Participants and Subjects. Alleged violations of this policy or any ACM Publications Policy will be investigated by ACM and may result in a full retraction of your paper, in addition to other potential penalties, as per ACM Publications Policy.

ACM is transitioning in 2026 to 100% Open Access for all ACM publications including those from ACM sponsored conferences. You can find an FAQ here: Open Access Model for ACM and SIG Sponsored Conferences: Frequently Asked Questions and more information here: Open Access Publication & ACM. Authors must be aware that the Article Processing Charges listed in the page applies to all accepted papers. For any questions regarding Open Access, please contact dl-info@hq.acm.org.

Please ensure that you and your co-authors obtain an ORCID ID, so you can complete the publishing process for your accepted paper. ACM has been involved in ORCID from the start and we have recently made a commitment to collect ORCID IDs from all of our published authors. We are committed to improve author discoverability, ensure proper attribution and contribute to ongoing community efforts around name normalization; your ORCID ID will help in these efforts.

Important update on ACM’s new open access publishing model for 2026 ACM Conferences!

Starting January 1, 2026, ACM will fully transition to Open Access. All ACM publications, including those from ACM-sponsored conferences, will be 100% Open Access. Authors will have two primary options for publishing Open Access articles with ACM: the ACM Open institutional model or by paying Article Processing Charges (APCs). With over 1,800 institutions already part of ACM Open, the majority of ACM-sponsored conference papers will not require APCs from authors or conferences (currently, around 70-75%).

Authors from institutions not participating in ACM Open will need to pay an APC to publish their papers, unless they qualify for a financial or discretionary waiver. To find out whether an APC applies to your article, please consult the list of participating institutions in ACM Open and review the APC Waivers and Discounts Policy. Keep in mind that waivers are rare and are granted based on specific criteria set by ACM.

Understanding that this change could present financial challenges, ACM has approved a temporary subsidy for 2026 to ease the transition and allow more time for institutions to join ACM Open. The subsidy will offer:

  • $250 APC for ACM/SIG members
  • $350 for non-members

This represents a 65% discount, funded directly by ACM. Authors are encouraged to help advocate for their institutions to join ACM Open during this transition period.

This temporary subsidized pricing will apply to all conferences scheduled for 2026.