Abstract:
Nuclear power plants need a large number of control systems to achieve effective control and safe operation of the system, in which nuclear power plant core is the key component of radioactive nuclear fuel heat source, and reactor power control is related to the safety and economy of nuclear power plant operation. Therefore, it is of great significance to optimize the design of nuclear reactor power controller. In the controller design stage, the control parameters of PID controller will be fixed in advance, which makes the control effect of PID controller has a certain degree of optimization space. In order to solve the problem that traditional PID controller is difficult to accurately deal with the nonlinear power control in the high power range, this study derived and established a reactor core model for a pressurized water reactor nuclear power plant. The core model includes heat transfer equation, neutron dynamics equation and reactivity equation. In this study, an adaptive controller based on deep reinforcement learning based on policy gradient (deep deterministic policy gradient algorithm) combined with PID (proportional integral derivative) controller was used to simulate power control, and a reward function was constructed. The reward function can be used to represent the optimization of several control evaluation indexes such as response time, threat time, control accuracy, overshoot and oscillation. The depth deterministic policy gradient algorithm can realize real-time optimization policy learning of PID controller control parameters by interacting with core model in real time. After several groups of working conditions with different power levels and different power switching modes were tested. The simulation results show that: In the 100%FP-90%FP step power reduction process (training condition), compared with the traditional PID controller, the self-adaption power controller designed based on the depth deterministic policy gradient algorithm has faster response speed, higher control accuracy and stability. At the same time, under the conditions (test conditions) of 40%FP-30%FP step power reduction process, 90%FP-100%FP step power increase process, 30%FP-40%FP step power increase process, 100%FP-30%FP linear power reduction process and 30%FP-100%FP linear power increase process, The control effect of the self-adaption power controller designed based on the depth deterministic policy gradient algorithm is also significantly better than that of the traditional PID controller, which indicates that the controller designed by this method has high robustness and can accurately map the power variation information of the pile type to the optimal control parameters of the PID controller. The proposed method can accurately and quickly control the core power, and track load changes.