PPoPP 2026
Sat 31 January - Wed 4 February 2026 Sydney, Australia
co-located with HPCA/CGO/PPoPP/CC 2026
Tue 3 Feb 2026 17:55 - 18:15 at Pyrmont - Optimizing Transformers Chair(s): Shaoshuai Zhang

Computing attention is the backbone of transformer-based models like large language models. However, the increasing diversity of attention algorithms presents significant challenges for unleashing hardware performance. State-of-the-art variants like FlashAttention target a specific attention algorithm or hardware platform, which fail to generalize to other algorithms and platforms.

We present MetaAttention, a framework that automatically derives the optimal implementation of an attention algorithm given a hardware platform.
Our key insight is that variants of attention can be abstracted into two operations:
relevance scoring and aggregation, complemented by customizable functions and configurations like the input shape.
Based on it, we systematically design a cross-backend attention runtime around these operations that generalizes to variants of attention with customizable operators.
To unleash the hardware performance, we further propose an IntermediateTensor-based search method to find the optimal tiling strategy and the parallelism scheme according to the attention customization and hardware features.
MetaAttention delivers up to a 10.4$\times$ speedup for configurations previously unsupported by state-of-the-art systems.
Additionally, MetaAttention achieves performance comparable to manually-optimized libraries such as FlashMLA while significantly reducing the amount of code required.

Tue 3 Feb

Displayed time zone: Hobart change

17:15 - 18:15
Optimizing TransformersMain Conference at Pyrmont
Chair(s): Shaoshuai Zhang University of Electronic Science and Technology of China
17:15
20m
Talk
FlashAttention-T: Towards Fully Tensorized Attention by Exploiting Tensor-Vector Parallelism
Main Conference
Jianxing Xu University of Science and Technology of China, Yuanbo Wen , Jun Bi Chinese Academy of Sciences, Ruibai Xu University of Science and Technology of China, Guanglin Xu Chinese Academy of Sciences, Rui Zhang Chinese Academy of Sciences, Wei Li Chinese Academy of Sciences, Ling Li Institute of Software, Chinese Academy of Sciences, Tianshi Chen Cambricon Technologies, Qi Guo Chinese Academy of Sciences, Yunji Chen Chinese Academy of Sciences
DOI
17:35
20m
Talk
Accelerating Sparse Transformer Inference on GPU
Main Conference
Wenhao Dai China University of Petroleum-Beijing, Haodong Deng China University of Petroleum, Mengfei Rong China University of Petroleum, Xinyu Yang Beihang University, Hongyu Liu Baidu Inc., Fangxin Liu Shanghai Jiao Tong University, Hailong Yang Beihang University, Qianwen Cao China University of Petroleum, Qingxiao Sun Beihang University
DOI
17:55
20m
Talk
MetaAttention: A Unified and Performant Attention Framework Across Hardware Backends
Main Conference
Feiyang Chen Shanghai Jiao Tong University, Yu Cheng Peking University, Lei Wang Peking University, Yuqing Xia Microsoft Research, Ziming Miao Microsoft Research, Lingxiao Ma Microsoft Research, Fan Yang Microsoft Research Asia, Jilong Xue Microsoft Research, Zhi Yang Peking University, Mao Yang Microsoft Research, Xingda Wei Shanghai Jiao Tong University, Haibo Chen Shanghai Jiao Tong University
DOI