DiggerBees: Depth First Search Leveraging Hierarchical Block-Level Stealing on GPUs
Depth First Search (DFS) is a fundamental graph traversal algorithm with broad applications. While existing work-stealing DFS approaches achieve strong performance on CPUs, mapping them to modern GPUs faces three major challenges: (1) limited shared memory cannot accommodate deep stacks, (2) frequent stack operations hinder efficient intra-block execution, and (3) irregular workloads complicate scalable inter-block execution.
In this paper, we propose DiggerBees, a GPU-optimized parallel DFS algorithm with hierarchical block-level stealing, consisting of three components. First, we introduce a two-level stack structure to mitigate shared memory limitations. Second, we employ warp-level DFS with intra-block work stealing to enable efficient execution within a block. Third, we implement inter-block work stealing to achieve scalable execution across blocks and sustain high parallelism. Experimental results on the latest NVIDIA GPUs show that DiggerBees outperforms existing DFS approaches, CKL-PDFS, ACR-PDFS, and NVG-DFS, achieving average speedups of $1.37\times$, $1.83\times$, and $30.18\times$, respectively. Moreover, DiggerBees even surpasses high-performance GPU BFS implementations on graphs with deep and narrow traversal paths, and scales efficiently across GPU generations.
Mon 2 FebDisplayed time zone: Hobart change
11:30 - 12:50 | |||
11:30 20mTalk | Rethinking Thread Scheduling under Oversubscription: A User-Space Framework for Coordinating Multi-runtime and Multi-process WorkloadsBest Paper Nominee Main Conference DOI | ||
11:50 20mTalk | Waste-Efficient Work Stealing Main Conference Kyle Singer Massachusetts Institute of Technology, Kunal Agrawal Washington University in St. Louis, TB Schardl Massachusetts Institute of Technology DOI | ||
12:10 20mTalk | DiggerBees: Depth First Search Leveraging Hierarchical Block-Level Stealing on GPUs Main Conference Yuyao Niu Barcelona Supercomputing Center, Yuechen Lu China University of Petroleum-Beijing, Weifeng Liu China University of Petroleum-Beijing, Marc Casas Barcelona Supercomputing Center DOI | ||
12:30 20mTalk | PANA: A Fine-Grained Runtime-Adaptive Load Balancing for Parallel SpMV on Multicore CPUs Main Conference Haodong Bian Tsinghua University, Youhui Zhang Tsinghua University, Xiang Fei Tsinghua University, Jianqiang Huang Qinghai University, Xiaoying Wang Qinghai University DOI | ||