BAGUALU: Targeting Brain Scale Pretrained Models with over 37 Million Cores (PPoPP 2022 - Main Conference)

Who

Zixuan Ma, Jiaao He, Jiezhong Qiu, Huanqi Cao, Yuanwei Wang, Zhenbo Sun, Liyan Zheng, Haojie Wang, Shizhi Tang, Tianyu Zheng, Junyang Lin, Guanyu Feng, Zeqiang Huang, Jie Gao, Aohan Zeng, Jianwei Zhang, Runxin Zhong, Tianhui Shi, Sha Liu, Weimin Zheng, Jie Tang, Hongxia Yang, Xin Liu, Jidong Zhai, Wenguang Chen

Track

PPoPP 2022 Main Conference

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 5 Apr 2022 10:20 - 10:35 - Session 4 Chair(s): Kenjiro Taura

Abstract

Large-scale pretrained AI models have shown state-of-the-art accuracy in a series of important applications. As the size of pretrained AI models grows dramatically each year in an effort to achieve higher accuracy, training such models requires massive computing and memory capabilities, which accelerates the convergence of AI and HPC. However, there are still gaps in deploying AI applications on HPC systems, which need application and system co-design based on specific hardware features.

To this end, this paper proposes BaGuaLu, the first work targeting training brain scale models on an entire exascale supercomputer, the New Generation Sunway Supercomputer. By combining hardware-specific intra-node optimization and hybrid parallel strategies, BaGuaLu enables decent performance and scalability on unprecedentedly large models. The evaluation shows that BaGuaLu can train 14.5-trillion-parameter models with a performance of over 1 EFLOPS using mixed-precision and has the capability to train 174- trillion-parameter models, which rivals the number of synapses in a human brain.

Zixuan Ma

Tsinghua University

Jiaao He

Tsinghua University, China

China

Jiezhong Qiu

Tsinghua University and Beijing Academy of Artificial Intelligence

Huanqi Cao

Tsinghua University

Yuanwei Wang

Tsinghua University

Zhenbo Sun

Tsinghua University

Liyan Zheng

Tsinghua University

Haojie Wang

Tsinghua University

Shizhi Tang

Tsinghua University

Tianyu Zheng

Zhejiang Lab

Junyang Lin

DAMO Academy, Alibaba Group

Guanyu Feng

Tsinghua University

Zeqiang Huang

Zhejiang Lab

Jie Gao

Zhejiang Lab

Aohan Zeng

Tsinghua University and Beijing Academy of Artificial Intelligence

Jianwei Zhang

DAMO Academy, Alibaba Group

Runxin Zhong

Tsinghua University

Tianhui Shi

Tsinghua University

Sha Liu

Zhejiang Lab

Weimin Zheng

Tsinghua University

Jie Tang

Tsinghua University and Beijing Academy of Artificial Intelligence

Hongxia Yang

DAMO Academy, Alibaba Group

Xin Liu

Zhejiang Lab

Jidong Zhai

Tsinghua University

China

Wenguang Chen

Tsinghua University

Time Zone

The program is currently displayed in (GMT-04:00) Eastern Time (US & Canada).

Use conference time zone: (GMT-04:00) Eastern Time (US & Canada)Select other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 5 Apr
Displayed time zone: Eastern Time (US & Canada) change

10:20 - 11:20	Session 4Main Conference Chair(s): Kenjiro Taura The University of Tokyo

10:20 15m Talk		BAGUALU: Targeting Brain Scale Pretrained Models with over 37 Million Cores Main Conference Zixuan Ma Tsinghua University, Jiaao He Tsinghua University, China, Jiezhong Qiu Tsinghua University and Beijing Academy of Artificial Intelligence, Huanqi Cao Tsinghua University, Yuanwei Wang Tsinghua University, Zhenbo Sun Tsinghua University, Liyan Zheng Tsinghua University, Haojie Wang Tsinghua University, Shizhi Tang Tsinghua University, Tianyu Zheng Zhejiang Lab, Junyang Lin DAMO Academy, Alibaba Group, Guanyu Feng Tsinghua University, Zeqiang Huang Zhejiang Lab, Jie Gao Zhejiang Lab, Aohan Zeng Tsinghua University and Beijing Academy of Artificial Intelligence, Jianwei Zhang DAMO Academy, Alibaba Group, Runxin Zhong Tsinghua University, Tianhui Shi Tsinghua University, Sha Liu Zhejiang Lab, Weimin Zheng Tsinghua University, Jie Tang Tsinghua University and Beijing Academy of Artificial Intelligence, Hongxia Yang DAMO Academy, Alibaba Group, Xin Liu Zhejiang Lab, Jidong Zhai Tsinghua University, Wenguang Chen Tsinghua University
10:35 15m Talk		Extending the limit of molecular dynamics with ab initio accuracy to 10 billion atoms Main Conference Zhuoqiang Guo Institute of Computing Technology, Chinese Academy of Sciences, Denghui Lu HEDPS, CAPT, College of Engineering, Peking University, Yujin Yan Institute of Computing Technology, Chinese Academy of Sciences, Siyu Hu Institute of Computing Technology, Chinese Academy of Sciences, Rongrong Liu Institute of Computing Technology, Chinese Academy of Sciences, Guangming Tan Chinese Academy of Sciences(CAS), Ninghui Sun State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Wanrun Jiang AI for Science Institute, Lijun Liu Osaka University, Yixiao Chen Princeton University, Linfeng Zhang DP Technology, Mohan Chen HEDPS, CAPT, College of Engineering, Peking University, Han Wang Laboratory of Computational Physics, Institute of Applied Physics and Computational Mathematics, Weile Jia Institute of Computing Technology, Chinese Academy of Sciences
10:50 15m Talk		LOTUS: Locality Optimizing Triangle Counting Main Conference Mohsen Koohi Esfahani Queen's University Belfast, Peter Kilpatrick Queen's University Belfast, Hans Vandierendonck Queen's University Belfast Link to publication Pre-print
11:05 15m Talk		Scaling Graph Traversal to 281 Trillion Edges with 40 Million Cores Main Conference Huanqi Cao Tsinghua University, Yuanwei Wang Tsinghua University, Haojie Wang Tsinghua University, Heng Lin Peking University, Zixuan Ma Tsinghua University, Wanwang Yin National Supercomputing Center in Wuxi, Wenguang Chen Tsinghua University