At the heart of any deep learning algorithm is the stochastic gradient descent optimizer. This is what makes the model learn and, at the same time, makes learning computationally expensive. Distributing the computation to different nodes on the cluster should reduce the training time. TensorFlow allows us to split the computational graph, describes the model to different nodes in the cluster, and finally merges the result.
This is achieved in TensorFlow with the help of master nodes, worker nodes, and parameter nodes. The actual computation is done by the worker nodes; the computed parameters are kept by the parameter nodes and shared with worker nodes. The master node is responsible for coordinating ...