基于值学习与策略梯度的深度强化学习在核工程领域的适配性分析

谭思超; 刘震; 刘永超; 李桐; 梁彪; 王博; 李江宽; 田瑞峰

doi:10.7538/yzk.2024.youxian.0407

基于值学习与策略梯度的深度强化学习在核工程领域的适配性分析

Adaptability Analysis of Value-based and Policy-based Deep Reinforcement Learning in Nuclear Field

摘要

摘要: 深度强化学习能够实现端到端处理，将高维度的原始输入数据直接转化为输出动作。深度强化学习按照间接和直接的策略优化方式，主要可分为基于值学习和基于策略梯度的两类方法。二者因原理不同，在解决问题的能力和适用场景上存在差异。核领域中的决策问题状态参数维度高，同时决策参数与状态参数之间存在强非线性关系，是深度强化学习的潜在应用场景。本文从强化学习的基本原理出发，归纳了基于值学习和基于策略梯度的强化学习方法的机理差异，并结合目前研究现状对两类方法在核工程领域可能的应用场景进行了深入分析。最后，总结了深度强化学习在后续应用中所面临的挑战及应用趋势。

Abstract: Deep reinforcement learning, an important branch of artificial intelligence, enables end-to-end processing from high-dimensional raw input data directly to output actions. The core of deep reinforcement learning lies in its use of deep neural networks, which automatically extract features from complex input data and make decisions. In the field of nuclear engineering, decision-making often involves high-dimensional state parameters, and these state parameters have complex nonlinear relationships with decision parameters, providing potential application scenarios for deep reinforcement learning. Depending on the policy optimization method, deep reinforcement learning can be divided into two major categories: value-based methods and policy gradient-based methods. The fundamentals of these two approaches are different, leading to differences in their capabilities and suitability for solving various problems. In order to effectively utilize deep reinforcement learning to solve nuclear engineering problems, an in-depth analysis of the suitability of these two types of methods in the nuclear engineering field has been conducted. Value-based deep reinforcement learning guides policy selection by estimating state value functions or action value functions. Common methods include Deep Q-network, Double Deep Q-network, Dueling DQN, Prioritized Experience Replay DQN, etc. After analyzing the fundamentals of these methods and combining them with practical research cases, it is found that value-based deep reinforcement learning performs better in tasks involving discrete action spaces, such as adjusting to compensate for sensor failures, controlling the enrichment of nuclear fuel components, and fault diagnosis. Policy gradient-based deep reinforcement learning optimizes the objective function by using policy gradients to find a good policy. Representative methods include Deep Deterministic Policy Gradient, Trust Region Policy Optimization, Twin Delayed Deep Deterministic Policy Gradient and so on. Comparing and analyzing the basic principles of these methods and considering the current research status in the field of nuclear engineering, it was found that policy gradient-based deep reinforcement learning is more suitable for complex control and optimization tasks in the nuclear engineering field. Such tasks include start-up and shutdown control of reactors, coolant system control, and magnetic control of nuclear fusion Tokamak devices. These tasks typically involve high-dimensional states and continuous action spaces, requiring agents to have higher flexibility and adaptability. Deep reinforcement learning demonstrates broad application prospects in the field of nuclear engineering, but its practical application still faces many challenges. For instance, the high safety and reliability requirements of nuclear engineering systems make the robustness and reliability of algorithms crucial issues. Additionally, the scarcity of data and the practical deployment of algorithms are also challenges that needed to be solved. Successfully applying deep reinforcement learning in complex nuclear engineering environments still requires overcoming various technical barriers.

HTML全文

参考文献(54)

施引文献

资源附件(0)