SPIDER: Unleashing Sparse Tensor Cores for Stencil Computation via Strided Swapping (PPoPP 2026 - Main Conference)

Sat 31 January - Wed 4 February 2026 Sydney, Australia

co-located with HPCA/CGO/PPoPP/CC 2026

Who

Qiqi Gu, Chenpeng Wu, Heng Shi, Jianguo Yao

Track

PPoPP 2026 Main Conference

Time Zone

The program is currently displayed in (GMT+11:00) Hobart.

Use conference time zone: (GMT+11:00) HobartSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 3 Feb 2026 09:50 - 10:10 at Pyrmont - Stencil and Sparse Matrix Computation Chair(s): Shoaib Kamil

Abstract

Recent research has focused on accelerating stencil computations by exploiting emerging hardware like Tensor Cores. To leverage these accelerators, the stencil operation must be transformed to matrix multiplications. However, this transformation introduces undesired sparsity into the kernel matrix, leading to significant redundant computation.

In this paper, we present SPIDER, the first system to turn this unresolved sparsity into an optimization opportunity by exploring the potential of Sparse Tensor Cores (SpTCs) for stencil acceleration. Specifically, SPIDER introduces an efficient and elegant transformation method that integrates two cooperative techniques: an ahead-of-time strided swapping transformation for kernel matrices and an on-the-fly row-swapping mechanism for inputs. This rule-based approach effectively transforms stencil computation into operations compatible with SpTCs, introducing only slight compile-time overhead and zero runtime overhead. Additionally, SPIDER incorporates multiple optimizations to maximize computational efficiency. Experimental evaluations demonstrate that SPIDER outperforms vendor library cuDNN by 6.23$\times$ and state-of-the-art (SOTA) Tensor Core-based approaches (ConvStencil, FlashFFTStencil, etc.) by 1.98$\times$ on average.

DOI

https://doi.org/10.1145/3774934.3786414

Qiqi Gu

Shanghai Jiao Tong University

Chenpeng Wu

Shanghai Jiao Tong University

Heng Shi

Jianguo Yao

Shanghai Jiao Tong University; Shanghai Enflame Technology

China