Chapter . Operating policy generation using a reinforcement learning agent in a melt facility
Doug Creighton and Saeid Nahavandi
Intelligent Systems Research Lab, Deakin University, Victoria 3217, Australia
This study presents a methodology to allow a reinforcement learning agent to generate near-optimal policies for a melt facility. The application of the learning method to this industrial scale, dynamic, stochastic problem poses a number of challenges. The process is formulated as a semi-Markov Decision Problem. A novel method for application of RL agents to continuous state and action spaces, based on mapping continuous to discrete state and action spaces is developed. The agent successfully identified robust polices that improved on ...