RoMeo: Mitigating Dual-dimensional Outliers with Rotated Mixed Precision Quantization (PPoPP 2026 - Main Conference)

Sat 31 January - Wed 4 February 2026 Sydney, Australia

co-located with HPCA/CGO/PPoPP/CC 2026

Who

Qihao Zhang, MingLiang Tang, Mingshu Zhai, Kinman Lei, Jidong Zhai

Track

PPoPP 2026 Main Conference

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 3 Feb 2026 11:30 - 11:50 at Balmoral - Mixed Precision and Quantization Chair(s): Dingwen Tao

Abstract

Mixed precision quantization has been adopted to accelerate large language models (LLMs) serving by leveraging high-throughput low-precision compute units in GPUs while preserving outliers in higher precision to maintain model accuracy. However, existing methods focus on mitigating single-dimensional channel-wise outliers, leading to model accuracy degradation when scaled to 4-bit precision.

In this paper, we present an algorithm-system co-design to effectively handle dual-dimensional outliers across both channel and token dimensions in LLMs. We introduce a novel rotation-based mixed precision quantization algorithm that suppresses and migrates channel-wise outliers to the token dimension. Based on this algorithm, we propose RoMeo, an efficient LLM serving system designed to overcome the unique system challenges posed by sparse computation pattern and dynamic outlier detection inherent in token-wise outlier handling. Extensive evaluations across various LLMs demonstrate that RoMeo improves quantized model accuracy by up to $5.17%$ compared to state-of-the-art methods QuaRot and MixQ, while maintaining efficiency comparable to uniform precision quantizations, achieving up to $2.10 \times$ end-to-end speedup over half-precision baseline. RoMeo is available at https://github.com/thu-pacman/RoMeo.

DOI

https://doi.org/10.1145/3774934.3786419

Qihao Zhang

Tsinghua University

MingLiang Tang

Tsinghua University

Mingshu Zhai

Tsinghua University

Kinman Lei

Tsinghua University

Jidong Zhai

Tsinghua University

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 3 Feb
Displayed time zone: Hobart change

11:30 - 12:50	Mixed Precision and QuantizationMain Conference at Balmoral Chair(s): Dingwen Tao Institute of Computing Technology, Chinese Academy of Sciences

11:30 20m Talk		RoMeo: Mitigating Dual-dimensional Outliers with Rotated Mixed Precision Quantization Main Conference Qihao Zhang Tsinghua University, MingLiang Tang Tsinghua University, Mingshu Zhai Tsinghua University, Kinman Lei Tsinghua University, Jidong Zhai Tsinghua University DOI
11:50 20m Talk		High-Throughput Non-Uniformly Quantized 3-bit LLM Inference Main Conference YuAng Chen Chinese University of Hong Kong, Wenqi Zeng Hong Kong University of Science and Technology, Jeffrey Xu Yu Chinese University of Hong Kong DOI
12:10 20m Talk		JanusQuant: Accurate and Efficient 2-bit KV Cache Quantization for Long-Context Inference Main Conference Chengyu Sun Wuhan University, Yaqi Xia Wuhan University, Hulin Wang , Donglin Yang Nvidia Corporation, Xiaobo Zhou University of Macau, Dazhao Cheng WuHan University DOI
12:30 20m Talk		HierCut: Enabling 16-bit Format Mixed Precision for Molecular Dynamics through Hierarchical CutoffBest Artifact Award Main Conference zeyu song Tsinghua University, Lin Gan Tsinghua University, Xiaohui Duan Shandong University, Jiayu Fu Tsinghua University, Zhengrui Li Tsinghua University, Yinuo Wang Tsinghua University, Guangzhao Li Chinese Academy of Sciences, Guangwen Yang Tsinghua University DOI