Extending the limit of molecular dynamics with ab initio accuracy to 10 billion atoms
High-performance computing, together with a neural network model trained from data generated with first-principles methods, has greatly boosted applications of \textit{ab initio} molecular dynamics in terms of spatial and temporal scales on modern supercomputers. Previous state-of-the-art can achieve $1-2$ nanoseconds molecular dynamics simulation per day for 100-million atoms on the entire Summit supercomputer. In this paper, we have significantly reduced the memory footprint and computational time by a comprehensive approach with both algorithmic and system innovations. The neural network model is compressed by model tabulation, kernel fusion, and redundancy removal. Then optimizations such as acceleration of customized kernel, tabulation of activation function, MPI+OpenMP parallelization are implemented on GPU and ARM architectures. Testing results of the copper system show that the optimized code can scale up to the entire machine of both Fugaku and Summit, and the corresponding system size can be extended by a factor of $134$ to an unprecedented $17$ billion atoms. The strong scaling of a $13.5$-million atom copper system shows that the time-to-solution can be 7 times faster, reaching {$11.2$} nanoseconds per day. This work opens the door for unprecedentedly large-scale molecular dynamics simulations based on {\it ab initio} accuracy and can be potentially utilized in studying more realistic applications such as mechanical properties of metals, semiconductor devices, batteries, etc.The optimization techniques detailed in this paper also provide insight for relevant high-performance computing applications.
Tue 5 AprDisplayed time zone: Eastern Time (US & Canada) change
10:20 - 11:20 | |||
10:20 15mTalk | BAGUALU: Targeting Brain Scale Pretrained Models with over 37 Million Cores Main Conference Zixuan Ma Tsinghua University, Jiaao He Tsinghua University, China, Jiezhong Qiu Tsinghua University and Beijing Academy of Artificial Intelligence, Huanqi Cao Tsinghua University, Yuanwei Wang Tsinghua University, Zhenbo Sun Tsinghua University, Liyan Zheng Tsinghua University, Haojie Wang Tsinghua University, Shizhi Tang Tsinghua University, Tianyu Zheng Zhejiang Lab, Junyang Lin DAMO Academy, Alibaba Group, Guanyu Feng Tsinghua University, Zeqiang Huang Zhejiang Lab, Jie Gao Zhejiang Lab, Aohan Zeng Tsinghua University and Beijing Academy of Artificial Intelligence, Jianwei Zhang DAMO Academy, Alibaba Group, Runxin Zhong Tsinghua University, Tianhui Shi Tsinghua University, Sha Liu Zhejiang Lab, Weimin Zheng Tsinghua University, Jie Tang Tsinghua University and Beijing Academy of Artificial Intelligence, Hongxia Yang DAMO Academy, Alibaba Group, Xin Liu Zhejiang Lab, Jidong Zhai Tsinghua University, Wenguang Chen Tsinghua University | ||
10:35 15mTalk | Extending the limit of molecular dynamics with ab initio accuracy to 10 billion atoms Main Conference Zhuoqiang Guo Institute of Computing Technology, Chinese Academy of Sciences, Denghui Lu HEDPS, CAPT, College of Engineering, Peking University, Yujin Yan Institute of Computing Technology, Chinese Academy of Sciences, Siyu Hu Institute of Computing Technology, Chinese Academy of Sciences, Rongrong Liu Institute of Computing Technology, Chinese Academy of Sciences, Guangming Tan Chinese Academy of Sciences(CAS), Ninghui Sun State Key Laboratory of Computer Architecture, Institute of Computing Technology, Chinese Academy of Sciences, University of Chinese Academy of Sciences, Wanrun Jiang AI for Science Institute, Lijun Liu Osaka University, Yixiao Chen Princeton University, Linfeng Zhang DP Technology, Mohan Chen HEDPS, CAPT, College of Engineering, Peking University, Han Wang Laboratory of Computational Physics, Institute of Applied Physics and Computational Mathematics, Weile Jia Institute of Computing Technology, Chinese Academy of Sciences | ||
10:50 15mTalk | LOTUS: Locality Optimizing Triangle Counting Main Conference Mohsen Koohi Esfahani Queen's University Belfast, Peter Kilpatrick Queen's University Belfast, Hans Vandierendonck Queen's University Belfast Link to publication Pre-print | ||
11:05 15mTalk | Scaling Graph Traversal to 281 Trillion Edges with 40 Million Cores Main Conference Huanqi Cao Tsinghua University, Yuanwei Wang Tsinghua University, Haojie Wang Tsinghua University, Heng Lin Peking University, Zixuan Ma Tsinghua University, Wanwang Yin National Supercomputing Center in Wuxi, Wenguang Chen Tsinghua University |