3

Payoff Learning and Dynamics

3.1    Introduction

A central learning problem in dynamic environments is balancing exploration of untested actions against exploitation of actions that are known to be good. The benefit of exploration can be estimated using the notion value of information (VoI), i.e., the expected improvement in future decision quality that might arise from the information acquired by exploration.

In this chapter we study games with (numerical) noisy payoffs. The payoff-learning is sometimes referred to as Q-learning [188, 189]. Here we focus on specific classes of stochastic games with incomplete information in which the state transitions are action-independent. We develop fully distributed iterative schemes to learn expected ...

Get Distributed Strategic Learning for Wireless Engineers now with O’Reilly online learning.

O’Reilly members experience live online training, plus books, videos, and digital content from 200+ publishers.