Critics that update from deterministic actions tend to overfit in narrow peaks. The consequence is an increase in variance. TD3 presents a smoothing regularization technique that adds a clipped noise to a small area near the target action:
The regularization can be implemented in a function that takes a vector and a scale as arguments:
def add_normal_noise(x, noise_scale): return x + np.clip(np.random.normal(loc=0.0, scale=noise_scale, size=x.shape), -0.5, 0.5)
Then, add_normal_noise is called after running the target policy, as shown in the following lines of code (the changes with respect to the DDPG implementation ...