Vapro: Performance Variance Detection and Diagnosis for Production-Run Parallel Applications
Performance variance is a serious problem for parallel applications, which can cause performance degradation and make applications’ behavior hard to understand. Therefore, detecting and diagnosing performance variance are of crucial importance for users and application developers. However, previous detection approaches either bring too large overhead and hurt applications’ performance, or rely on nontrivial source code analysis that is impractical for production-run parallel applications.
In this work, we propose Vapro, a performance variance detection and diagnosis framework for production-run parallel applications. Our approach is based on an important observation that most parallel applications contain code snippets that are repeatedly executed with fixed workload, which can be used for performance variance detection. To effectively identify these snippets at runtime even without program source code, we introduce State Transition Graph (STG) to track program execution and then conduct light-weight workload analysis on STG to locate variance. To diagnose the detected variance, Vapro leverages a progressive diagnosis method based on a hybrid model leveraging variance breakdown and statistical analysis. Results show that the performance overhead of Vapro is only 1.38% on average. Vapro can detect the variance in real applications caused by hardware bugs, memory, and IO. After fixing the detected variance, the standard deviation of the execution time is reduced by up to 73.5%. Compared with the state-of-the-art variance detection tool based on source code analysis, Vapro achieves 30.0% higher detection coverage.
Tue 5 AprDisplayed time zone: Eastern Time (US & Canada) change
11:40 - 12:25 | |||
11:40 15mTalk | Vapro: Performance Variance Detection and Diagnosis for Production-Run Parallel Applications Main Conference Liyan Zheng Tsinghua University, Jidong Zhai Tsinghua University, Xiongchao Tang Sangfor Technologies Inc. and Tsinghua University, Haojie Wang Tsinghua University, Teng Yu Tsinghua University, Yuyang Jin Tsinghua University, Shuaiwen Leon Song University of Sydney, Wenguang Chen Tsinghua University | ||
11:55 15mTalk | Interference Relation-Guided SMT Solving for Multi-Threaded Program Verification Main Conference | ||
12:10 15mTalk | PerFlow: A Domain Specific Framework for Automatic Performance Analysis of Parallel Applications Main Conference Yuyang Jin Tsinghua University, Haojie Wang Tsinghua University, Runxin Zhong Tsinghua University, Chen Zhang Tsinghua University, Jidong Zhai Tsinghua University |