MetaAttention: A Unified and Performant Attention Framework Across Hardware Backends (PPoPP 2026 - Main Conference)

Who

Feiyang Chen, Yu Cheng, Lei Wang, Yuqing Xia, Ziming Miao, Lingxiao Ma, Fan Yang, Jilong Xue, Zhi Yang, Mao Yang, Xingda Wei, Haibo Chen

Track

PPoPP 2026 Main Conference

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 3 Feb 2026 17:55 - 18:15 at Pyrmont - Optimizing Transformers Chair(s): Shaoshuai Zhang

Abstract

Computing attention is the backbone of transformer-based models like large language models. However, the increasing diversity of attention algorithms presents significant challenges for unleashing hardware performance. State-of-the-art variants like FlashAttention target a specific attention algorithm or hardware platform, which fail to generalize to other algorithms and platforms.

We present MetaAttention, a framework that automatically derives the optimal implementation of an attention algorithm given a hardware platform.
Our key insight is that variants of attention can be abstracted into two operations:
relevance scoring and aggregation, complemented by customizable functions and configurations like the input shape.
Based on it, we systematically design a cross-backend attention runtime around these operations that generalizes to variants of attention with customizable operators.
To unleash the hardware performance, we further propose an IntermediateTensor-based search method to find the optimal tiling strategy and the parallelism scheme according to the attention customization and hardware features.
MetaAttention delivers up to a 10.4$\times$ speedup for configurations previously unsupported by state-of-the-art systems.
Additionally, MetaAttention achieves performance comparable to manually-optimized libraries such as FlashMLA while significantly reducing the amount of code required.

DOI

https://doi.org/10.1145/3774934.3786444

Feiyang Chen

Shanghai Jiao Tong University

Yu Cheng

Peking University

Lei Wang

Peking University

Yuqing Xia

Microsoft Research

Ziming Miao

Microsoft Research

Lingxiao Ma

Microsoft Research

Fan Yang

Microsoft Research Asia

Jilong Xue

Microsoft Research

Zhi Yang

Peking University

Mao Yang

Microsoft Research

Xingda Wei

Shanghai Jiao Tong University

Haibo Chen

Shanghai Jiao Tong University

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 3 Feb
Displayed time zone: Hobart change

17:15 - 18:15	Optimizing TransformersMain Conference at Pyrmont Chair(s): Shaoshuai Zhang University of Electronic Science and Technology of China

17:15 20m Talk		FlashAttention-T: Towards Fully Tensorized Attention by Exploiting Tensor-Vector Parallelism Main Conference Jianxing Xu University of Science and Technology of China, Yuanbo Wen , Jun Bi Chinese Academy of Sciences, Ruibai Xu University of Science and Technology of China, Guanglin Xu Chinese Academy of Sciences, Rui Zhang Chinese Academy of Sciences, Wei Li Chinese Academy of Sciences, Ling Li Institute of Software, Chinese Academy of Sciences, Tianshi Chen Cambricon Technologies, Qi Guo Chinese Academy of Sciences, Yunji Chen Chinese Academy of Sciences DOI
17:35 20m Talk		Accelerating Sparse Transformer Inference on GPU Main Conference Wenhao Dai China University of Petroleum-Beijing, Haodong Deng China University of Petroleum, Mengfei Rong China University of Petroleum, Xinyu Yang Beihang University, Hongyu Liu Baidu Inc., Fangxin Liu Shanghai Jiao Tong University, Hailong Yang Beihang University, Qianwen Cao China University of Petroleum, Qingxiao Sun Beihang University DOI
17:55 20m Talk		MetaAttention: A Unified and Performant Attention Framework Across Hardware Backends Main Conference Feiyang Chen Shanghai Jiao Tong University, Yu Cheng Peking University, Lei Wang Peking University, Yuqing Xia Microsoft Research, Ziming Miao Microsoft Research, Lingxiao Ma Microsoft Research, Fan Yang Microsoft Research Asia, Jilong Xue Microsoft Research, Zhi Yang Peking University, Mao Yang Microsoft Research, Xingda Wei Shanghai Jiao Tong University, Haibo Chen Shanghai Jiao Tong University DOI