Conference Papers
- FlashTensor: Optimizing Tensor Programs by Leveraging Fine-grained Tensor Property. PPoPP. 2025.
- WiseGraph: Optimizing GNN with Joint Workload Partition of Graph and Operations. EuroSys. 2024.
- BoostN: Optimizing Imbalanced Neighborhood Communication on Homogeneous Many-Core System. ICPP. 2024.
- PUZZLE: Efficiently Aligning Large Language Models through Light-Weight Context Switch. USENIX ATC. 2024.
- PerFlow: A Domain Specific Framework for Automatic Performance Analysis of Parallel Applications. PPoPP. 2022.
- Vapro: Performance Variance Detection and Diagnosis for Production-Run Parallel Applications. PPoPP. 2022.
- ScalAna: Automating Scaling Loss Detection with Graph Analysis. SC. 2020.
Journal Papers
- Leveraging Graph Analysis to Pinpoint Root Causes of Scalability Issues for Parallel Applications. IEEE TPDS. 2025.
- Graph-Centric Performance Analysis for Large-Scale Parallel Applications. IEEE TPDS. 2024.
- Efficient Inference for Pruned CNN Models on Mobile Devices With Holistic Sparsity Alignment. IEEE TPDS. 2024.
- Unified Programming Models for Heterogeneous High-Performance Computers. JCST. 2023.
- Detecting Performance Variance for Parallel Applications Without Source Code. IEEE TPDS. 2022.
Monographs
- Research on Key Technologies of Performance Analysis and Optimization for Large-Scale Parallel Applications. Tsinghua University Press (清华大学出版社). 2024.
- Performance Analysis of Parallel Applications for HPC. Springer Nature. 2023.