Abstract:
Deep reinforcement learning, an important branch of artificial intelligence, enables end-to-end processing from high-dimensional raw input data directly to output actions. The core of deep reinforcement learning lies in its use of deep neural networks, which automatically extract features from complex input data and make decisions. In the field of nuclear engineering, decision-making often involves high-dimensional state parameters, and these state parameters have complex nonlinear relationships with decision parameters, providing potential application scenarios for deep reinforcement learning. Depending on the policy optimization method, deep reinforcement learning can be divided into two major categories: value-based methods and policy gradient-based methods. The fundamentals of these two approaches are different, leading to differences in their capabilities and suitability for solving various problems. In order to effectively utilize deep reinforcement learning to solve nuclear engineering problems, an in-depth analysis of the suitability of these two types of methods in the nuclear engineering field has been conducted. Value-based deep reinforcement learning guides policy selection by estimating state value functions or action value functions. Common methods include Deep
Q-network, Double Deep
Q-network, Dueling DQN, Prioritized Experience Replay DQN, etc. After analyzing the fundamentals of these methods and combining them with practical research cases, it is found that value-based deep reinforcement learning performs better in tasks involving discrete action spaces, such as adjusting to compensate for sensor failures, controlling the enrichment of nuclear fuel components, and fault diagnosis. Policy gradient-based deep reinforcement learning optimizes the objective function by using policy gradients to find a good policy. Representative methods include Deep Deterministic Policy Gradient, Trust Region Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient and so on. Comparing and analyzing the basic principles of these methods and considering the current research status in the field of nuclear engineering, it was found that policy gradient-based deep reinforcement learning is more suitable for complex control and optimization tasks in the nuclear engineering field. Such tasks include start-up and shutdown control of reactors, coolant system control, and magnetic control of nuclear fusion Tokamak devices. These tasks typically involve high-dimensional states and continuous action spaces, requiring agents to have higher flexibility and adaptability. Deep reinforcement learning demonstrates broad application prospects in the field of nuclear engineering, but its practical application still faces many challenges. For instance, the high safety and reliability requirements of nuclear engineering systems make the robustness and reliability of algorithms crucial issues. Additionally, the scarcity of data and the practical deployment of algorithms are also challenges that needed to be solved. Successfully applying deep reinforcement learning in complex nuclear engineering environments still requires overcoming various technical barriers.