There are a number of optimization algorithms besides SGD available in PyTorch. The following code shows one such algorithm:
optim.Adadelta(params, lr=1.0, rho=0.9, eps=1e-06, weight_decay=0)
The Adedelta algorithm is based on stochastic gradient descent; however, instead of having the same learning rate over each iteration, the learning rate adapts over time. The Adadelta algorithm maintains separate dynamic learning rates for each dimension. This can make training quicker and more efficient, as the overhead of calculating new learning rates on each iteration is quite small compared to actually calculating the gradients. The Adadelta algorithm performs well with noisy data for a range of model architectures, large gradients, ...