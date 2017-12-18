TensorFlow is the library that revolutionized the way we approach machine learning problems. It was designed to build deep neural networks, train them, and evaluate and serve the solutions. The result of its popularity is the genuine democratization of AI. Like any library, it provides classes and functions designed to tackle deep learning process. This introduces an interesting black box dilemma. In one way, it gives you ready-to-use, very often one line of code, solutions. On the other hand, it hides most of the implementation details from the user. Fortunately, TensorFlow offers different levels of abstraction, giving the opportunity to determine the level of control in the hands of the programmer.

In this article, we’ll showcase TensorFlow's abstraction by building and training a neural network for the canonical classification task of recognizing handwritten digits from the MNIST data set. This is an elementary computer vision problem. Because the digits are represented as the arrays of pixels (either 2D or flattened 1D), they can be fed in as input to the neural network. The architectures tackling image recognition tasks are usually a combination of fully connected, convolutional, and polling layers. Once set, the model is trained, evaluated, and can be used to classify new examples.

We'll show the highest-level abstract calls, a less abstract version that uses common mathematical and statistical functions, a fine-grained method using Estimators, and the level giving you the most control, Keras.

We’ll start at a high-abstraction level for the first part of machine learning, the training phase. This phase is generally represented as an optimization problem that tunes the loss function coefficients through Gradient Descent, Momentum, or Adam optimizers. In TensorFlow, once you define the loss function, you can use the library implementations to minimize it. For instance, using the Adam optimizer requires just one line of pretty much self-documenting code:

train_step = tf.train.AdamOptimizer(learning_rate).minimize(loss)

All you need to pass is the tolerable loss and the learning rate. Learning rate is one of the hyperparameters that influence the training. If it's too small, training can take too long to reach the loss function minimum. When it's too big, the steps can “miss” the desired value. Defining the training step as shown allows you to focus on tuning the model rather than putting efforts into the details of the optimization process.

The first attempt to build the neural network for handwritten digits recognition uses only fully connected layers, which is common in machine learning. Depending on the desired depth, we can stack layers on top of each other. The sizes of the hidden layers are network hyperparameters, and the output size is 10 (the number of digits the network is supposed to recognize).

In the classic fully connected layer, every neuron takes all the values from the input, multiplies them by weights that can change with each iteration, and adds a bias component. By using vectorization, the forward propagation can be achieved using the following matrix operation (where X is the input matrix, W is the weight matrix, and b is the bias):

In a very raw form, a TensorFlow implementation of the fully connected model requires you to define the variables representing the neuron weights and biases, carry out the multiply operation on proper matrices, and apply the activation function. Each layer would require the following code:

# Variables definition W = tf.Variable(tf.truncated_normal([prev_size, current_size], stddev=0.1)) b = tf.Variable(tf.constant(0.1, shape=[current_size])) # Matrices operations layer = tf.matmul(prev_layer, W) + b # Applying reLU as the activation function layer = tf.nn.relu(layer)

This looks like a lot of work. You need to know the details regarding matrices sizes and specific operations. And all of this needs to be repeated if you decide to use more layers. Fortunately, there is an easier way to achieve the same goal. TensorFlow provides you with high-level definitions through the tf.contrib.layers.fully_connected or tf.layers.dense functions. The specific implementations differ a bit, but the general principle remains the same. The entire layer definition (including applying the activation function) is now just one line for either function:

# Fully connected layer = tf.contrib.layers.fully_connected(prev_layer, current_size, activation_fn=tf.nn.relu)

# Dense layer = tf.layers.dense(prev_layer, current_size, activation=tf.nn.relu)

Both the fully connected network and the training process can be built in an even more fine-grained fashion using Estimators. The main idea is to abstract the classification process itself by specifying the input, the training method (neural network in our case), and the optimizer.

# Build the classifier classifier = tf.contrib.learn.DNNClassifier(feature_columns=feature_columns, hidden_units=[hidden_size1, hidden_size2, hidden_size3], n_classes=labels_size, optimizer=tf.train.AdamOptimizer()) # Train the classifier classifier.fit(x=features, y=labels, steps=1000)

For computer vision, fully connected layers are usually not enough. The most successful solutions use several convolutional layers followed by a few fully connected ones. Convolution is a popular technique that applies shared weights to small parts of the image. It makes the most of the images’ representation instead of flattening them to serve the input. The standard practice is also to use a pooling layer that takes an average or maximum value from parts of the original data.

Similarly to a fully connected implementation, the low-level implementation of a convolutional layer requires defining variables, applying a convolutional operation, adding the activation function, and applying max-pooling. For the convolution and max-pooling operators, you can use tf.nn.conv2d and tf.nn.max_pool accordingly.

# Variables definition W = tf.Variable(tf.truncated_normal([height, width, 1, channels], stddev=0.1)) b = tf.Variable(tf.constant(0.1, shape=[channels])) # Applying convolution operation and adding bias layer = tf.nn.conv2d(prev_layer, W, strides=[1, 1, 1, 1], padding='SAME') + b # Activation function layer = tf.nn.relu(layer) # Max-pooling layer = tf.nn.max_pool(layer, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')

To take it to the higher level, we should revisit what tf.layers has to offer. For 2D convolution, one can use tf.layers.conv2d and tf.layers.max_pooling2d. The variable initializations, matrix transformations, and activation function wrapping are handled internally by the functions’ implementations, letting the programmer focus on the high-level architecture.

# Convolution layer = tf.layers.conv2d(prev_layer, channels, kernel_size=[height, width], padding="same", activation=tf.nn.relu) # Max-pooling layer = tf.layers.max_pooling2d(layer, pool_size=[2, 2], strides=2)

The most abstracted level of implementing deep learning architectures on TensorFlow, giving you the least control, involves Keras. It also grants you a good deal of freedom in deployment and hosting. You can use this library on top of TensorFlow as the running engine or some other platform. The code to build two convolutional and max-pooling pairs followed by a hidden and output fully connected layer is pretty straightforward:

from keras import layers from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D layer = Conv2D(filters=filter_number1, kernel_size=(height1, width1), activation='relu')(layer) layer = MaxPooling2D(pool_size=(2, 2))(layer) layer = Conv2D(filters=filter_number2, kernel_size=(height2, width2), activation='relu')(layer) layer = MaxPooling2D(pool_size=(2, 2))(layer) layer = Flatten()(layer) layer = Dense(hidden_size, activation='relu')(layer) layer = Dense(10, activation='sigmoid')(layer)

When you decide to use a specific library, you should always think of the tradeoffs it involves. One may be tempted just to apply the high-level APIs, but without understanding how they work, using them may not be as effective. On the other hand, implementing everything on your own may result in spending a lot of time on the problems that are already solved. The art is to find the proper level of abstraction, and TensorFlow is an excellent platform that allows you to make an appropriate choice.

