基于深度确定性策略梯度算法的自适应核反应堆功率控制器设计

Design of Self-adaption Nuclear Reactor Power Controller Based on Deep Deterministic Policy Gradient Algorithm

  • 摘要: 核电厂需要大量控制系统来实现系统有效控制与安全运行,其中核电站堆芯是放射性核燃料热源的关键部件,反应堆功率控制关系到核电厂运行的安全性与经济性。为解决传统PID控制器难以准确应对非线性、大功率范围的功率控制问题,本研究以某压水堆核电厂为对象推导并建立了反应堆堆芯模型,采用基于策略梯度的深度强化学习方法与PID控制器结合建立的自适应控制器进行功率控制仿真。仿真结果表明:相较于传统PID控制器,所设计的基于深度确定性策略梯度算法的自适应功率控制器,响应速度更快、控制精度与稳定性更高,同时具有较高的鲁棒性,可以准确快速地控制堆芯功率,跟踪负荷变化。

     

    Abstract: Nuclear power plants need a large number of control systems to achieve effective control and safe operation of the system, in which nuclear power plant core is the key component of radioactive nuclear fuel heat source, and reactor power control is related to the safety and economy of nuclear power plant operation. Therefore, it is of great significance to optimize the design of nuclear reactor power controller. In the controller design stage, the control parameters of PID controller will be fixed in advance, which makes the control effect of PID controller has a certain degree of optimization space. In order to solve the problem that traditional PID controller is difficult to accurately deal with the nonlinear power control in the high power range, this study derived and established a reactor core model for a pressurized water reactor nuclear power plant. The core model includes heat transfer equation, neutron dynamics equation and reactivity equation. In this study, an adaptive controller based on deep reinforcement learning based on policy gradient (deep deterministic policy gradient algorithm) combined with PID (proportional integral derivative) controller was used to simulate power control, and a reward function was constructed. The reward function can be used to represent the optimization of several control evaluation indexes such as response time, threat time, control accuracy, overshoot and oscillation. The depth deterministic policy gradient algorithm can realize real-time optimization policy learning of PID controller control parameters by interacting with core model in real time. After several groups of working conditions with different power levels and different power switching modes were tested. The simulation results show that: In the 100%FP-90%FP step power reduction process (training condition), compared with the traditional PID controller, the self-adaption power controller designed based on the depth deterministic policy gradient algorithm has faster response speed, higher control accuracy and stability. At the same time, under the conditions (test conditions) of 40%FP-30%FP step power reduction process, 90%FP-100%FP step power increase process, 30%FP-40%FP step power increase process, 100%FP-30%FP linear power reduction process and 30%FP-100%FP linear power increase process, The control effect of the self-adaption power controller designed based on the depth deterministic policy gradient algorithm is also significantly better than that of the traditional PID controller, which indicates that the controller designed by this method has high robustness and can accurately map the power variation information of the pile type to the optimal control parameters of the PID controller. The proposed method can accurately and quickly control the core power, and track load changes.

     

/

返回文章
返回