Problem setup
The core setup of RL involves two components: (a) an Agent and (b) Environment. Both of these components interact dynamically with each other. For example, an agent takes a particular action that changes the existing state of the environment. The environment, based on this change, transitions to a new state and provides feedback to the agent as to how positive or negative the action taken by the agent was. This feedback is what we refer to as the weak supervision for the agent. After receiving this feedback, the agent tries to learn and optimize its future actions so that it can maximize its positive feedback. This feedback is often referred to as the reward function. After a few iterations, when the agent has learnt well, the ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.
Read now
Unlock full access