Abstract
Solving the neutron diffusion equation is a fundamental and critical problem in nuclear reactor physics calculations. In recent years, physics-informed neural networks (PINNs), as a novel paradigm that does not rely on traditional meshing and analytical solutions, have been introduced to solve the reactor neutron diffusion equation and are gradually entering the mainstream. However, a core limitation of conventional PINNs exists: during the training process, the loss function is typically composed of multi-objective sub-terms such as the partial differential equation (PDE) residual term, initial condition error term, and boundary condition error term. When optimizing these sub-terms, their gradient directions frequently exhibit severe conflicts and scale differences, leading to problems such as the network training getting trapped in local optima, low optimization efficiency, slow convergence speed, and unstable prediction results. This gradient conflict is particularly pronounced in complex coupled scenarios like multi-group, multi-material nuclear reactors, becoming a critical bottleneck limiting PINN performance. In this paper, a complex residual-adaptive resampling physics-informed neural network (GHO_R2-PINN) architecture driven by gradient harmonization optimization (GHO) was proposed, aimed at systematically mitigating the negative effects of this gradient imbalance challenge in practical optimization. The GHO_R2-PINN architecture first utilizes a convolutional neural network (S-CNN) structure with shortcut connections as its backbone, effectively alleviating the common vanishing gradient problem in deep learning network training and enhancing the network’s generalization capability. Building upon this, it incorporates a residual-adaptive resampling (RAR) mechanism, which dynamically adjusts the sampling point distribution based on the physical equation residuals during training. This enables the model to focus on learning areas with larger errors and more complex physical constraints in the later stages of optimization, thereby enhancing the fitting ability for complex geometries and physical field changes. Concurrently, the GHO optimization strategy was introduced, which aimed to dynamically generate a unique, conflict-free optimization direction by precisely calculating the pseudo-inverse of the matrix formed by all loss gradients. This ensures that the optimization direction forms an acute angle with every independent loss gradient, achieving a coordinated and consistent descent across the multi-objective sub-loss terms. This effectively avoids the situation where one loss term dominates the training process at the expense of others’ accuracy. Furthermore, based on the characteristics of the GHO optimization strategy and traditional optimizers, this paper further proposed a hybrid optimization strategy (“fast start, stable end”): in the initial training phase, the GHO method was utilized to significantly accelerate the network’s robust convergence, enhance its global search capability, and rapidly steer network parameters away from undesirable local minima, preventing local entrapment. After a certain number of iterations, the optimization smoothly transitions to a traditional optimizer (such as L-BFGS or ADAM) to complete the final stage of fine error reduction and stable convergence in the high-accuracy region. The effectiveness and robustness of the GHO_R2-PINN architecture were fully validated through three theoretical benchmark examples and engineering cases: the nonlinear Burgers equation, the one-group, one-material 1D bare reactor problem, and the complex multi-group, multi-material IAEA 2D benchmark. In the Burgers equation experiment, the GHO method reduces the maximum prediction error by approximately 15 times compared to the original gradient method. Compared to the traditional PINN architecture, the GHO_R2-PINN architecture shortens the total training time by about 50% and improves the overall prediction accuracy by approximately 1.5 times. Specifically, for the IAEA 2D benchmark, the optimal hybrid strategy significantly reduces the training time from about 4 hours with traditional methods to about 1.5 hours, demonstrating its high efficiency in practical applications. Moreover, once the network training is complete, a single neutron flux prediction can be finished in millisecond time, showing good real-time responsiveness. In summary, the proposed GHO_R2-PINN architecture effectively mitigates the critical problem of gradient competition in PINN training, achieving certain breakthroughs in both theoretical and engineering aspects, and significantly improving the convergence speed, prediction accuracy, and training stability. The efficient and reliable solution provided by this paper offers a considerable technical pathway for future complex multi-physics field coupled calculations in nuclear reactor design.