A Diagonal Block Memory-Aware Polynomial Preconditioner for Linear and Eigenvalue Solvers
Krylov subspace methods are widely used in scientific computing to solve large sparse linear systems and eigenvalue problems. Their performance bottleneck is often dominated by high-order matrix-power kernels (MPK), especially in polynomial preconditioners that must scale to millions or billions of variables. We present Diagonal Block MPK (DBMPK), a lightweight and parallel-friendly optimization that partitions the input matrix into diagonal blocks and off-diagonal regions. This design enables efficient intra-block data reuse and eliminates inter-block dependencies. It improves cache locality, parallelism, and reduces preprocessing overheads, compared to existing techniques. Our evaluation on x86 and Arm HPC platforms shows that DBMPK improves MPK performance by 26.6%-38.4%. When applied to polynomial preconditioners for linear systems and eigenvalue problems, it achieves consistent end-to-end speedups of 18.6%-34.0%, including in weak scaling tests on 128 nodes, demonstrating strong scalability and practical impact.
Wed 4 FebDisplayed time zone: Hobart change
09:50 - 11:10 | Matrix and Linear Algebra AlgorithmsMain Conference at Pyrmont Chair(s): William S. Moses University of Illinois Urbana-Champaign | ||
09:50 20mTalk | Towards Singular Value Decomposition for Rank-Deficient Matrices: An Efficient and Accurate Algorithm on GPU Architectures Main Conference Lu Shi University of Electronic Science and Technology of China, WeiWei Xu Nanjing University of Information Science and Technology, Shaoshuai Zhang University of Electronic Science and Technology of China DOI | ||
10:10 20mTalk | A Diagonal Block Memory-Aware Polynomial Preconditioner for Linear and Eigenvalue Solvers Main Conference Xiaojian Yang National University of Defense Technology, Yuhui Ni National University of Defense Technology, Fan Yuan Xiangtan University, Shengguo Li National University of Defense Technology, Dezun Dong NUDT, xuchuanfu National University of Defense Technology, Haipeng Jia Jia, Jie Liu National University of Defense Technology DOI | ||
10:30 20mTalk | A Distributed Matrix-Block-Vector Multiplication in Presence of System Performance Variability Main Conference Yuchen Ma College of William & Mary, Bin Ren College of William & Mary, Andreas Stathopoulos College of William & Mary DOI | ||
10:50 20mTalk | Characterizing Matrix Multiplication Units across General Parallel Patterns in Scientific Computing Main Conference Yuechen Lu China University of Petroleum-Beijing, Hongwei Zeng , Marc Casas Barcelona Supercomputing Center, Weifeng Liu China University of Petroleum-Beijing DOI | ||