POSTER: A W-cycle Algorithm for Efficient Batched SVD on GPUs (PPoPP 2022 - Main Conference)

Who

Junmin Xiao, Qing Xue, Hui Ma, Xiaoyang Zhang, Guangming Tan

Track

PPoPP 2022 Main Conference

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 5 Apr 2022 15:20 - 15:25 - Poster Session Chair(s): Yan Gu

Abstract

As a fundamental factorization operation, the singular value decomposition (SVD) plays a paramount role in a broad range of domains such as scientific computing and machine learning. Due to its computational bottleneck of factorization for small matrices in real-world applications, many GPU-accelerated batched SVD algorithms have been investigated recently. However, these algorithms failed to achieve a balance between data locality and parallelism because their workflows depend on the size of each matrix. In this work, we propose a matrix-size-independent W-cycle algorithm to accelerate the batched one-side Jacobi SVD on GPUs, which successfully strikes the balance between data locality and parallelism. The experimental evaluation demonstrates that the proposed algorithm achieves $4.5 \times$ performance speedup on average over the state-of-the-art cuSOLVER.

Junmin Xiao

Institute of Computing Technology of Chinese Academy of Sciences

Qing Xue

Institute of Computing Technology, Chinese Academy of Sciences

Hui Ma

Institute of Computing Technology, Chinese Academy of Sciences

Xiaoyang Zhang

Institute of Computing Technology, Chinese Academy of Sciences

Guangming Tan

Chinese Academy of Sciences(CAS)

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 5 Apr
Displayed time zone: Eastern Time (US & Canada) change

14:00 - 15:25	Poster SessionMain Conference Chair(s): Yan Gu UC Riverside

14:00 5m Talk		POSTER: Automatic Synthesis of Parallel Unix Commands and Pipelines with KumQuat Main Conference Jiasi Shen Massachusetts Institute of Technology, Martin C. Rinard Massachusetts Institute of Technology, Nikos Vasilakis Massachusetts Institute of Technology
14:05 5m Talk		POSTER: Towards OmpSs-2 and OpenACC Interoperation Main Conference Orestis Korakitis Barcelona Supercomputing Center (BSC), Simon Garcia De Gonzalo Barcelona Supercomputing Center (BSC), Nicolas Guidotti INESC-ID, Instituto Superior Técnico, University of Lisbon, João Barreto INESC-ID, José C. Monteiro INESC-ID, Instituto Superior Técnico, University of Lisbon, Antonio J. Peña Barcelona Supercomputing Center (BSC)
14:10 5m Talk		POSTER: LB-HM: Load Balance-Aware Data Placement on Heterogeneous Memory for Task-Parallel HPC Applications Main Conference Zhen Xie University of California, Merced, Jie Liu , Sam Ma College of William & Mary, Jiajia Li William & Mary, Pacific Northwest National Laboratory, Dong Li University of California, Merced
14:15 5m Talk		POSTER: Hardening Selective Protection across Multiple Program Inputs for HPC Applications Main Conference Yafan Huang University of Iowa, Shengjian Guo Baidu USA, Sheng Di Argonne National Laboratory, Guanpeng Li University of Iowa, Franck Cappello Argonne National Laboratory
14:20 5m Talk		POSTER: A Parallel Branch-and-Bound Algorithm with History-Based Domination Main Conference Taspon Gonggiatgul California State University, Sacramento, Ghassan Shobaki California State University, Sacramento, Pınar Muyan-Özçelik California State University, Sacramento
14:25 5m Talk		POSTER: Remote OpenMP Offloading Main Conference Atmn Patel University of Waterloo, Johannes Doerfert Argonne National Laboratory
14:30 5m Talk		POSTER: High Performance GPU Concurrent B+tree Main Conference Weihua Zhang Fudan University, Chuanlei Zhao Fudan University, Lu Peng Louisiana State University, Yuzhe Lin Fudan University, Fengzhe Zhang Fudan University, Jinhu Jiang Fudan University
14:35 5m Talk		POSTER: The Problem-Based Benchmark Suite (PBBS), V2 Main Conference Daniel Anderson Carnegie Mellon University, Guy E. Blelloch Carnegie Mellon University, USA, Laxman Dhulipala University of Maryland, College Park, Magdalen Dobson Carnegie Mellon University, Yihan Sun University of California, Riverside
14:40 5m Talk		POSTER: An LLVM-based Open-Source Compiler for NVIDIA GPUs Main Conference Da Yan Hong Kong University of Science and Technology, Wei Wang Hong Kong University of Science and Technology, Xiaowen Chu Data Science and Analytics Thrust, HKUST(GZ)
14:45 5m Talk		POSTER: ParGeo: A Library for Parallel Computational Geometry Main Conference Yiqiu Wang Massachusetts Institute of Technology, Shangdi Yu Massachusetts Institute of Technology, Laxman Dhulipala University of Maryland, College Park, Yan Gu UC Riverside, Julian Shun MIT
14:50 5m Talk		POSTER: Parallel Algorithms for Masked Sparse Matrix-Matrix Products Main Conference Srđan Milaković Rice University, Oguz Selvitopi Lawrence Berkeley National Laboratory, Israt Nisa AWS AI, Zoran Budimlić Rice University, Aydin Buluc Lawrence Berkeley National Laboratory
14:55 5m Talk		POSTER: Rethinking Graph Data Placement for Graph Neural Network Training on Multiple GPUs Main Conference Shihui Song The University of Iowa, Peng Jiang The University of Iowa
15:00 5m Talk		POSTER: Optimizing Consistency for Partially Replicated Data Stores Main Conference Ivan Kuraj MIT CSAIL, USA, Armando Solar-Lezama Massachusetts Institute of Technology, Nadia Polikarpova University of California at San Diego
15:05 5m Talk		POSTER: Optimizing Sparse Computations Jointly Main Conference Kazem Cheshmi University of Toronto, Michelle Strout University of Arizona, Maryam Mehri Dehnavi University of Toronto
15:10 5m Talk		POSTER: wCQ: A Fast Wait-Free Queue with Bounded Memory Usage Main Conference Ruslan Nikolaev The Pennsylvania State University, Binoy Ravindran Virginia Tech
15:15 5m Talk		POSTER: Automatic Differentiation of Parallel Loops with Formal Methods Main Conference Jan Hueckelheim Argonne National Laboratory, Laurent Hascoet Inria
15:20 5m Talk		POSTER: A W-cycle Algorithm for Efficient Batched SVD on GPUs Main Conference Junmin Xiao Institute of Computing Technology of Chinese Academy of Sciences, Qing Xue Institute of Computing Technology, Chinese Academy of Sciences, Hui Ma Institute of Computing Technology, Chinese Academy of Sciences, Xiaoyang Zhang Institute of Computing Technology, Chinese Academy of Sciences, Guangming Tan Chinese Academy of Sciences(CAS)