基于CPU-GPU异构并行的MOC中子输运计算并行效率优化研究

Study on Optimization of Parallel Efficiency of CPU-GPU Heterogeneous Parallelization for MOC Neutron Transport Calculation

  • 摘要: CPU-GPU异构系统为加速全堆芯特征线方法(MOC)精细计算提供了方法和思路。在实现基于CPU-GPU异构系统的二维MOC异构并行算法基础上,提出了性能分析模型,识别了影响异构并行算法并行效率的主要因素;针对识别到的性能影响因素,实现了输运计算与数据传递相互掩盖,提升了异构并行算法的整体并行效率。数值结果表明:程序具备良好的计算精度;数据传递(MPI通信和CPU与GPU之间的数据拷贝)是影响异构并行算法并行效率的主要因素;实现输运计算与数据传递相互掩盖后,程序性能和强并行效率均有所提升;5异构节点(包含20块GPU)并行时,程序整体效率提升达8%,强并行效率从87%提升到95%;相比CPU节点并行计算,4个CPU-GPU异构节点整体性能优于20个CPU节点。

     

    Abstract: The CPU-GPU heterogeneous system provides method and idea for accelerating the whole-core MOC (method of characteristics) neutron transport calculation. A performance analysis model was proposed to identify the factors which significantly impact the parallel efficiency of the 2D MOC heterogeneous parallel algorithm based on the CPU-GPU heterogeneous system. Then the overall parallel efficiency was improved by the transport sweep and the data movement overlapping after the performance analysis. The numerical results demonstrate that the parallel algorithm maintains the desired accuracy. The data movement which includes the MPI communication and the data copy between CPU and GPU is the main factor affecting the parallel efficiency of heterogeneous parallel algorithm. The overall performance and the strong scaling efficiency are improved with the transport sweep and the data movement overlapping. About 8% improvement is observed in the overall performance and the strong scaling efficiency reaches 95% from 87% when 5 heterogeneous nodes (including 20 GPUs) are utilized to perform the simulation. Compared against the CPU-based parallelization, the overall performance of 4 CPUGPU heterogeneous nodes outperforms the performance of 20 CPU nodes.

     

/

返回文章
返回