Parallel Block-Delayed Sequences (PPoPP 2022 - Main Conference)

Who

Sam Westrick, Mike Rainey, Daniel Anderson, Guy E. Blelloch

Track

PPoPP 2022 Main Conference

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 4 Apr 2022 11:40 - 11:55 - Session 2 Chair(s): Ang Li

Abstract

Programming languages using functions on collections of values, such as map, reduce, scan and filter, have been used for over fifty years. Such collections have proven to be particularly useful in the context of parallelism because such functions are naturally parallel. However, if implemented naively they lead to the generation of temporary intermediate collections that can significantly increase memory usage and runtime. To avoid this pitfall, many approaches use ``fusion'' to combine operations and avoid temporary results. However, most of these approaches involve significant changes to a compiler and are limited to a small set of functions, such as maps and reduces.

In this paper we present a library-based approach that fuses widely used operations such as scans, filters, and flattens. In conjunction with existing techniques, this covers most of the common operations on collections. Our approach is based on a novel technique which parallelizes over blocks, with streams within each block. We demonstrate the approach by implementing libraries targeting multicore parallelism in two languages: Parallel ML and C++, which have very different semantics and compilers. To help users understand when to use the approach, we define a cost semantics that indicates when fusion occurs and how it reduces memory allocations. We present experimental results for a dozen benchmarks that demonstrate significant reductions in both time and space. In most cases the approach generates code that is near optimal for the machines it is running on.

Sam Westrick

Carnegie Mellon University

United States

Mike Rainey

Carnegie Mellon University

United States

Daniel Anderson

Carnegie Mellon University

United States

Guy E. Blelloch

Carnegie Mellon University, USA

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 4 Apr
Displayed time zone: Eastern Time (US & Canada) change

11:40 - 12:25	Session 2Main Conference Chair(s): Ang Li Pacific Northwest National Laboratory

11:40 15m Talk		Parallel Block-Delayed Sequences Main Conference Sam Westrick Carnegie Mellon University, Mike Rainey Carnegie Mellon University, Daniel Anderson Carnegie Mellon University, Guy E. Blelloch Carnegie Mellon University, USA
11:55 15m Talk		RTNN: Accelerating Neighbor Search Using Hardware Ray Tracing Main Conference Yuhao Zhu University of Rochester
12:10 15m Talk		TileSpGEMM: A Tiled Algorithm for Parallel Sparse General Matrix-Matrix Multiplication on GPUs Main Conference Yuyao Niu China University of Petroleum-Beijing, Zhengyang Lu China University of Petroleum-Beijing, Haonan Ji China University of Petroleum-Beijing, Shuhui Song China University of Petroleum-Beijing, Zhou Jin China University of Petroleum-Beijing, Weifeng Liu China University of Petroleum-Beijing