Chapter 4. Keras on TensorFlow 2.x

Keras is the most prominent high-level API for deep learning, and for everyone in its fan community there is great news. In TensorFlow 2.x, Keras is the official high-level API for TensorFlow. On August 22, 2019, François Chollet, the creator of Keras, announced that Keras 2.3.0 will support TensorFlow 2.x only.

Chollet recommends that TensorFlow users switch from the Keras standalone distribution to tf.keras, a package contained within the TensorFlow distribution, since it is better integrated with TensorFlow features.

The purpose of this chapter is to give you a short crash course in Keras and show you how existing Keras code can be easily migrated to tf.keras. Using Keras within the context of TensorFlow 2.x unlocks a couple of integrations with TensorFlow that are not present in standalone Keras. We’ll cover them in Chapter 5.

Keras Versus TensorFlow Linear Algebra Code

The main advantage of using Keras over the low-level, tensor-based TensorFlow API is that all the linear algebra magic is completely hidden from you.

Let’s review an example on a single hidden-layer neural network implemented in linear algebra on TensorFlow and on Keras. We’ll look at how to define such a neural network in pure TensorFlow linear algebra code:

import tensorflow as tf

w1 = tf.Variable([[1,1,1],[1,1,1],[1,1,1]],dtype=tf.float64)
w2 = tf.Variable([1,1,1],dtype=tf.float64)

def layer1(x):
    return tf.sigmoid(tf.tensordot(x,w1,axes=1))

def layer2(x):
    return tf.sigmoid(tf.tensordot(layer1(x),w2,axes=1))

The preceding code is as small as it can get using plain TensorFlow. You won’t be able to define a neural network with less than the code given. If you remember from Chapter 1, we’re basically only computing a series of vector dot products between either input data x or hidden layer activations l1 (the output of the layer1 function) and weights w1 and w2. Of course, this example doesn’t involve any neural network training using gradient descent, but let’s stop here and see how the same neural network architecture gets implemented (or expressed) using Keras:

from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(3, input_shape=(3,), activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))

As you can see, we went down from four relevant lines of code to two lines of relevant code (without counting the imports). In addition, we went from cryptic linear algebra to a clean, layer-based expression.

To make things complete, let’s prepare the code for Keras model training:

from keras.optimizers import SGD

model.compile(optimizer=SGD(learning_rate=0.1),
              loss='binary_crossentropy',
              metrics=['accuracy'])

model.fit(x, y_target, epochs=1000,
          verbose=1)

Model Compilation

Once the compile method is called on a model, the TensorFlow static execution graph is created. This expresses the computation you’ve defined using the Keras API. In older versions of Keras, which had multibackend support, different execution code was generated. Nowadays, only TensorFlow execution code is generated out of the model specification.

Now only the fit method on the model object needs to be called and training starts.

Keras Backward Compatibility

It’s time to migrate this code from Keras standalone to Keras within TensorFlow 2.x:

from tensorflow import keras
from keras.models import Sequential
from keras.layers import Dense

model = Sequential()
model.add(Dense(3, input_shape=(3,), activation='sigmoid'))
model.add(Dense(1, activation='sigmoid'))

As you can see, in this example only one line at the imports has been added; the model definition code stays the same. This is very handy for migration and re-usage of existing code.

But there is also a fundamental difference: now this Keras code is tightly coupled to TensorFlow and can make use of TensorFlow functionality without further changes. The most prominent example is running this code in parallel on a compute cluster without further changes. This example will be introduced in Chapter 5.

Functional Versus Sequential API

Most users are starting to learn Keras with the so-called Sequential API. This API is very easy to understand since it allows appending of neural network layers one after the other until the neural network is defined:

1 model = tf.keras.Sequential()
2 model.add(tf.keras.layers.Dense(23, ...))
3 model.add(tf.keras.layers.Dense(42, ...))
4 model.add(tf.keras.layers.Dense(5, ...))
5 model.add(tf.keras.layers.Dense(1, ...))
6 
7 model.compile(...)
8 
9 mode.fit(...)

This is sufficient for the majority of use cases. But there are exceptions—for example, in case you want to use Keras to define one of the following:

  • Models with non-sequential architecture

  • Multiple models that share layers between them

  • Models with multiple inputs or outputs

In other words, if your model resembles a directed acyclic graph (DAG) of layers. In a DAG, every node (layer) can have multiple children but only one parent. Therefore, Keras has chosen to define layers as functions that take their parent layers as parameters. Let us illustrate this:

input = tf.keras.Input(shape=(1024, 1024, 3))

This created a tensorflow.Python.framework.ops.Tensor. Now it gets more interesting as we append a dense layer to the input layer:

hidden1 = tf.keras.layers.Dense(64, activation='relu', name='y1')
y1 = hidden1(input)

In line 1 we’ve created a dense layer which we then call in line 2 with the parent, input, as parameter. In case we want to add a second dense layer to the input layer, we are free to do so:

hidden2 = tf.keras.layers.Dense(64, activation='relu', name='y2')
y2 = hidden2(input)

One final step creates a Keras model out of these components:

model = tf.keras.Model(inputs=input, outputs=[y1,y2])

The architecture of this model is nonsequential, as can be seen when printing the model.summary:


Model: "model"
________________________________________________________________
Layer (type)         Output Shape          Param #  Connected to
================================================================
input_1 (InputLayer) [(None, 1024, 1024, 0
________________________________________________________________
y1 (Dense)            (None, 1024, 1024, 6  256    input_1[0][0]
________________________________________________________________
y2 (Dense)            (None, 1024, 1024, 6  256    input_1[0][0]
================================================================
Total params: 512
Trainable params: 512
Non-trainable params: 0
________________________________________________________________

You can see that both dense layers are connected to the input layer. Figure 4-1 illustrates this.

wnt2 0401
Figure 4-1. Example of a nonsequential model

Now, the next step is usually calling model.compile(), but since we have two output layers, we also need to define two loss functions:

model.compile(optimizer='adam', loss={"y1": "categorical_crossentropy","y2": "categorical_crossentropy"},metrics=["accuracy"])

As you can see, we are passing the names of the loss functions for each output layer as Python dictionary and referring to the output layers using their names y1 and y2 as keys in the dictionary.

Finally, we need to call model.fit() as well. Again, as our neural network has two output layers, two target datasets need to be specified:

model.fit(train_x, {"y1": train_y1, "y2": train_y2}, epochs=10)

As you can see, the target data again is passed to the fit method as dictionary having the output layer names as keys.

Custom Layers

Building new tf.keras layers in pure TensorFlow might sound as if it should be reserved for the hardcore researcher, but actually it isn’t that hard. Let’s start with the most simple solution first.

Lambda Layers

A lambda function is an anonymous inner function and part of the Python specification. Using the TensorFlow AutoGraph functionality, as introduced before, Keras allows us to inject any Python code to be executed within the context of a Keras layer:

1 model.add(Lambda(lambda x: x ** 2))

This already created a custom Keras layer. In case you are not familiar with the lambda notation, the following code is semantically equivalent:

1 def my_lambda(x):
2     return x ** 2
3 
4 model = tf.keras.Sequential()
5 model.add(tf.keras.layers.Lambda(my_lambda))

So this layer just takes any input data (tensor) and squares each element of it.

Real Custom Layers

To implement a real custom layer, we will need to subclass tf.keras.layers.Layer. Let’s define a ScaleLayer, which scales all the input by a factor and returns the scaled input as output:

1 class ScaleLayer(tf.keras.layers.Layer):
2     def __init__(self, scale):
3         super(ScaleLayer, self).__init__()
4         self.scale = scale
5 
6     def call(self, inputs):
7         return inputs * self.scale

In line 1 we define a class called ScaleLayer, which subclasses tf.keras.layers.Layer. In line 2 we define the constructor that takes the scale factor as argument and assigns it to a class member variable in line 4 after the constructor of the parent class has been called.

Method call in line 6 gets executed whenever data propagates through the layer. The only thing that is done in line 7 is multiplying the tensor by the scale factor. The shape of tensor inputs doesn’t matter since it will multiply each element of it with the scale factor.

Keras Applications

Keras is all about abstraction and making life easier. So the next level of abstraction is prebuilt and pretrained models. Keras calls those applications. A Keras application is implemented as a Python function returning a Keras model:

1 vgg16_model=tf.keras.applications.VGG16(
2     input_shape=(1024,1024,3),
3     include_top=False,
4     weights='imagenet')

In line 1, the call to the tf.keras.applications.VGG16() function returns the model, which is of type tensor⁠flow.Python​.keras.engine.training.Model. If you’re used to sequential models, the line model = tf.keras.Sequential() creates a model of type tensorflow.Python.keras.engine.sequential.Sequential which has tensor⁠flow.Python​.keras.engine.training.Model as its base class—so you are basically dealing with the same type of object. As the VGG16() function creates the model for us during its call, we need to provide some more information. In line 2 we specify the shape of the images the model should be created for. Note that we have to provide a tuple with three integers to specify the number of color channels as well. Then, in line 3, include_top=False omits the output layer of the neural network. This is very handy because now this model can be included as part of other architectures, which we will see in a moment. Finally, line 4 asks to include weights in the layers, which have been obtained by training the network on the ImageNet dataset.

We will now have a look at the architecture of what we’ve obtained by printing the summary vgg16_model.summary():

Model: "vgg16"
__________________________________________________________
Layer (type)               Output Shape          Param #
==========================================================
input_1 (InputLayer)       [(None, 64, 64, 3)]   0
__________________________________________________________
block1_conv1 (Conv2D)      (None, 64, 64, 64)    1792
__________________________________________________________
block1_conv2 (Conv2D)      (None, 64, 64, 64)    36928
__________________________________________________________
block1_pool (MaxPooling2D) (None, 32, 32, 64)    0
__________________________________________________________
block2_conv1 (Conv2D)      (None, 32, 32, 128)   73856
__________________________________________________________
block2_conv2 (Conv2D)      (None, 32, 32, 128)   147584
__________________________________________________________
block2_pool (MaxPooling2D) (None, 16, 16, 128)   0
__________________________________________________________
block3_conv1 (Conv2D)      (None, 16, 16, 256)   295168
__________________________________________________________
block3_conv2 (Conv2D)      (None, 16, 16, 256)   590080
__________________________________________________________
block3_conv3 (Conv2D)      (None, 16, 16, 256)   590080
__________________________________________________________
block3_pool (MaxPooling2D) (None, 8, 8, 256)     0
__________________________________________________________
block4_conv1 (Conv2D)      (None, 8, 8, 512)     1180160
__________________________________________________________
block4_conv2 (Conv2D)      (None, 8, 8, 512)     2359808
__________________________________________________________
block4_conv3 (Conv2D)      (None, 8, 8, 512)     2359808
__________________________________________________________
block4_pool (MaxPooling2D) (None, 4, 4, 512)     0
__________________________________________________________
block5_conv1 (Conv2D)      (None, 4, 4, 512)     2359808
__________________________________________________________
block5_conv2 (Conv2D)      (None, 4, 4, 512)     2359808
__________________________________________________________
block5_conv3 (Conv2D)      (None, 4, 4, 512)     2359808
__________________________________________________________
block5_pool (MaxPooling2D) (None, 2, 2, 512)     0
==========================================================
Total params: 14,714,688
Trainable params: 14,714,688
Non-trainable params: 0
__________________________________________________________

You’ll notice that there are 18 hidden layers (plus the input layer, which technically isn’t a layer as it doesn’t compute anything) and 14,714,688 parameters (weights plus bias), which are all trainable. Let’s do one thing: let’s set all the layers as Non-trainable:

vgg16_model.trainable=False

If we now look at the summary we observe the following:

1 ...
2 ...
3 ...
4 ======================================================
5 Total params: 14,714,688
6 Trainable params: 0
7 Non-trainable params: 14,714,688
8 ______________________________________________________

If we further train this model, none of the weights will be adjusted. So what’s the point then? As researchers have found out, the first layers in a neural network are always generalizing very well. In other words, the weights are not so heavily dependent on the data; they are learning generic filters and transformations, which can be applied to any type of dataset. Therefore, we can use VGG16 as a pretrained model and just add some layers behind it, which we then train with our (limited) dataset. This process is called transfer learning:

1 model = tf.keras.Sequential()
2 model.add(vgg16_model)
3 model.add(tf.keras.layers.GlobalAveragePooling2D())
4 model.add(tf.keras.layers.Dense(23,activation='softmax'))

Here we use our existing vgg16_model and add a pooling and a fully connected layer (which are trainable) as an output layer:

____________________________________________________________
Layer (type)                 Output Shape          Param #
============================================================
vgg16 (Model)                (None, 2, 2, 512)     14714688
____________________________________________________________
global_average_pooling2d_2 ( (None, 512)           0
____________________________________________________________
dense_1 (Dense)              (None, 23)            11799
============================================================
Total params: 14,726,487
Trainable params: 11,799
Non-trainable params: 14,714,688
_____________________________________________________________

As you can see, 11,799 trainable parameters are contributed to the model by the last two layers we’ve just added.

By the way, we can also enable and disable trainability on a layer basis:

1 vgg16_model.layers[4].trainable=False

From here on, we can continue as usual with model.compile() and model.fit().

This section was inspired by Brijesh’s blog.

Summary

Keras greatly facilitates neural network definition, training, and usage. A key factor in this is abandoning the low-level linear-algebra-based mathematical expression of neural networks and using a more abstract, layer-based expression. Keras was already very prominent before it even supported TensorFlow as an underlying execution environment, and now Keras has become the official TensorFlow high-level API. Apart from the increased community support that this brings, it allows Keras to be tightly integrated with the TensorFlow backend under the hood, further abstracting away complex TensorFlow API by integrating functionality into Keras. The next chapter will cover the most important example of such an integration, namely parallel training.

Get What's New In TensorFlow 2.x? now with the O’Reilly learning platform.

O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.