The agent would like to obtain the different reward values ( r t i , r t i )(1im) and the final reward value set ( r t , r t )={( r t 1 , r t 1 ),( r t 1 , r t 1 ),...( r t m , r t m )} by taking different actions a t i (1im). Thus, the agent will select the peak value in the set, defined as ( r t max , r t max )(1maxm). Based on the peak reward value, the agent will obtain the corresponding action a t max (1maxm). Therefore, the current state of the agent will be converted from ( S t , S t )to( S t+1 , S t+1 ). Overall, the formulation of the learning process of the agent can be represented as ( R

Get Dynamic Fuzzy Machine Learning now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.