Adadelta is an enhancement of the Adagrad algorithm. In Adagrad, we noticed the problem of the learning rate diminishing to a very low number. Although Adagrad learns the learning rate adaptively, we still need to set the initial learning rate manually. However, in Adadelta, we don't need the learning rate at all. So how does the Adadelta algorithm learn?
In Adadelta, instead of taking the sum of all the squared past gradients, we can set a window of size and take the sum of squared past gradients only from that window. In Adagrad, we took the sum of all the squared past gradients and it led ...