The agent would like to obtain the different reward values and the final reward value set by taking different actions Thus, the agent will select the peak value in the set, defined as Based on the peak reward value, the agent will obtain the corresponding action Therefore, the current state of the agent will be converted from Overall, the formulation of the learning process of the agent can be represented as
Get Dynamic Fuzzy Machine Learning now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.