Chapter 4. Keras on TensorFlow 2.x
Keras is the most prominent high-level API for deep learning, and for everyone in its fan community there is great news. In TensorFlow 2.x, Keras is the official high-level API for TensorFlow. On August 22, 2019, François Chollet, the creator of Keras, announced that Keras 2.3.0 will support TensorFlow 2.x only.
Chollet recommends that TensorFlow users switch from the Keras standalone distribution to tf.keras
, a package contained within the TensorFlow distribution, since it is better integrated with TensorFlow features.
The purpose of this chapter is to give you a short crash course in Keras and show you how existing Keras code can be easily migrated to tf.keras
. Using Keras within the context of TensorFlow 2.x unlocks a couple of integrations with TensorFlow that are not present in standalone Keras. We’ll cover them in Chapter 5.
Keras Versus TensorFlow Linear Algebra Code
The main advantage of using Keras over the low-level, tensor-based TensorFlow API is that all the linear algebra magic is completely hidden from you.
Let’s review an example on a single hidden-layer neural network implemented in linear algebra on TensorFlow and on Keras. We’ll look at how to define such a neural network in pure TensorFlow linear algebra code:
import
tensorflow
as
tf
w1
=
tf
.
Variable
([[
1
,
1
,
1
],[
1
,
1
,
1
],[
1
,
1
,
1
]],
dtype
=
tf
.
float64
)
w2
=
tf
.
Variable
([
1
,
1
,
1
],
dtype
=
tf
.
float64
)
def
layer1
(
x
):
return
tf
.
sigmoid
(
tf
.
tensordot
(
x
,
w1
,
axes
=
1
))
def
layer2
(
x
):
return
tf
.
sigmoid
(
tf
.
tensordot
(
layer1
(
x
),
w2
,
axes
=
1
))
The preceding code is as small as it can get using plain TensorFlow. You won’t be able to define a neural network with less than the code given. If you remember from Chapter 1, we’re basically only computing a series of vector dot products between either input data x
or hidden layer activations l1 (the output of the layer1
function) and weights w1 and w2. Of course, this example doesn’t involve any neural network training using gradient descent, but let’s stop here and see how the same neural network architecture gets implemented (or expressed) using Keras:
from
keras.models
import
Sequential
from
keras.layers
import
Dense
model
=
Sequential
()
model
.
add
(
Dense
(
3
,
input_shape
=
(
3
,),
activation
=
'sigmoid'
))
model
.
add
(
Dense
(
1
,
activation
=
'sigmoid'
))
As you can see, we went down from four relevant lines of code to two lines of relevant code (without counting the imports). In addition, we went from cryptic linear algebra to a clean, layer-based expression.
To make things complete, let’s prepare the code for Keras model training:
from
keras.optimizers
import
SGD
model
.
compile
(
optimizer
=
SGD
(
learning_rate
=
0.1
),
loss
=
'binary_crossentropy'
,
metrics
=
[
'accuracy'
])
model
.
fit
(
x
,
y_target
,
epochs
=
1000
,
verbose
=
1
)
Model Compilation
Once the compile
method is called on a model, the TensorFlow static execution graph is created. This expresses the computation you’ve defined using the Keras API. In older versions of Keras, which had multibackend support, different execution code was generated. Nowadays, only TensorFlow execution code is generated out of the model specification.
Now only the fit
method on the model
object needs to be called and training starts.
Keras Backward Compatibility
It’s time to migrate this code from Keras standalone to Keras within TensorFlow 2.x:
from
tensorflow
import
keras
from
keras.models
import
Sequential
from
keras.layers
import
Dense
model
=
Sequential
()
model
.
add
(
Dense
(
3
,
input_shape
=
(
3
,),
activation
=
'sigmoid'
))
model
.
add
(
Dense
(
1
,
activation
=
'sigmoid'
))
As you can see, in this example only one line at the imports has been added; the model definition code stays the same. This is very handy for migration and re-usage of existing code.
But there is also a fundamental difference: now this Keras code is tightly coupled to TensorFlow and can make use of TensorFlow functionality without further changes. The most prominent example is running this code in parallel on a compute cluster without further changes. This example will be introduced in Chapter 5.
Functional Versus Sequential API
Most users are starting to learn Keras with the so-called Sequential API. This API is very easy to understand since it allows appending of neural network layers one after the other until the neural network is defined:
1model
=
tf
.
keras
.
Sequential
()
2model
.
add
(
tf
.
keras
.
layers
.
Dense
(
23
,
...
))
3model
.
add
(
tf
.
keras
.
layers
.
Dense
(
42
,
...
))
4model
.
add
(
tf
.
keras
.
layers
.
Dense
(
5
,
...
))
5model
.
add
(
tf
.
keras
.
layers
.
Dense
(
1
,
...
))
6 7model
.
compile
(
...
)
8 9mode
.
fit
(
...
)
This is sufficient for the majority of use cases. But there are exceptions—for example, in case you want to use Keras to define one of the following:
-
Models with non-sequential architecture
-
Multiple models that share layers between them
-
Models with multiple inputs or outputs
In other words, if your model resembles a directed acyclic graph (DAG) of layers. In a DAG, every node (layer) can have multiple children but only one parent. Therefore, Keras has chosen to define layers as functions that take their parent layers as parameters. Let us illustrate this:
input
=
tf
.
keras
.
Input
(
shape
=
(
1024
,
1024
,
3
))
This created a tensorflow.Python.framework.ops.Tensor
. Now it gets more interesting as we append a dense
layer to the input
layer:
hidden1
=
tf
.
keras
.
layers
.
Dense
(
64
,
activation
=
'relu'
,
name
=
'y1'
)
y1
=
hidden1
(
input
)
In line 1 we’ve created a dense
layer which we then call in line 2 with the parent, input
, as parameter. In case we want to add a second dense
layer to the input
layer, we are free to do so:
hidden2
=
tf
.
keras
.
layers
.
Dense
(
64
,
activation
=
'relu'
,
name
=
'y2'
)
y2
=
hidden2
(
input
)
One final step creates a Keras model out of these components:
model
=
tf
.
keras
.
Model
(
inputs
=
input
,
outputs
=
[
y1
,
y2
])
The architecture of this model is nonsequential, as can be seen when printing the model.summary
:
Model: "model" ________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================ input_1 (InputLayer) [(None, 1024, 1024, 0 ________________________________________________________________ y1 (Dense) (None, 1024, 1024, 6 256 input_1[0][0] ________________________________________________________________ y2 (Dense) (None, 1024, 1024, 6 256 input_1[0][0] ================================================================ Total params: 512 Trainable params: 512 Non-trainable params: 0 ________________________________________________________________
You can see that both dense
layers are connected to the input
layer. Figure 4-1 illustrates this.
Now, the next step is usually calling model.compile()
, but since we have two output layers, we also need to define two loss
functions:
model
.
compile
(
optimizer
=
'adam'
,
loss
=
{
"y1"
:
"categorical_crossentropy"
,
"y2"
:
"categorical_crossentropy"
},
metrics
=
[
"accuracy"
])
As you can see, we are passing the names of the loss
functions for each output layer as Python dictionary and referring to the output layers using their names y1
and y2
as keys in the dictionary.
Finally, we need to call model.fit()
as well. Again, as our neural network has two output layers, two target datasets need to be specified:
model
.
fit
(
train_x
,
{
"y1"
:
train_y1
,
"y2"
:
train_y2
},
epochs
=
10
)
As you can see, the target data again is passed to the fit
method as dictionary having the output layer names as keys.
Custom Layers
Building new tf.keras
layers in pure TensorFlow might sound as if it should be reserved for the hardcore researcher, but actually it isn’t that hard. Let’s start with the most simple solution first.
Lambda Layers
A lambda function is an anonymous inner function and part of the Python specification. Using the TensorFlow AutoGraph functionality, as introduced before, Keras allows us to inject any Python code to be executed within the context of a Keras layer:
1model
.
add
(
Lambda
(
lambda
x
:
x
**
2
))
This already created a custom Keras layer. In case you are not familiar with the lambda
notation, the following code is semantically equivalent:
1def
my_lambda
(
x
):
2return
x
**
2
3 4model
=
tf
.
keras
.
Sequential
()
5model
.
add
(
tf
.
keras
.
layers
.
Lambda
(
my_lambda
))
So this layer just takes any input data (tensor) and squares each element of it.
Real Custom Layers
To implement a real custom layer, we will need to subclass tf.keras.layers.Layer
. Let’s define a ScaleLayer
, which scales all the input by a factor and returns the scaled input as output:
1class
ScaleLayer
(
tf
.
keras
.
layers
.
Layer
):
2def
__init__
(
self
,
scale
):
3super
(
ScaleLayer
,
self
)
.
__init__
()
4self
.
scale
=
scale
5 6def
call
(
self
,
inputs
):
7return
inputs
*
self
.
scale
In line 1 we define a class called ScaleLayer
, which subclasses tf.keras.layers.Layer
. In line 2 we define the constructor that takes the scale
factor as argument and assigns it to a class member variable in line 4 after the constructor of the parent class has been called.
Method call
in line 6 gets executed whenever data propagates through the layer. The only thing that is done in line 7 is multiplying the tensor by the scale factor. The shape of tensor inputs
doesn’t matter since it will multiply each element of it with the scale factor.
Keras Applications
Keras is all about abstraction and making life easier. So the next level of abstraction is prebuilt and pretrained models. Keras calls those applications. A Keras application is implemented as a Python function returning a Keras model:
1vgg16_model
=
tf
.
keras
.
applications
.
VGG16
(
2input_shape
=
(
1024
,
1024
,
3
),
3include_top
=
False
,
4weights
=
'imagenet'
)
In line 1, the call to the tf.keras.applications.VGG16()
function returns the model, which is of type tensorflow.Python.keras.engine.training.Model
. If you’re used to sequential models, the line model
=
tf.keras.Sequential()
creates a model of type tensorflow.Python.keras.engine.sequential.Sequential
which has tensorflow.Python.keras.engine.training.Model
as its base class—so you are basically dealing with the same type of object. As the VGG16()
function creates the model for us during its call, we need to provide some more information. In line 2 we specify the shape of the images the model should be created for. Note that we have to provide a tuple with three integers to specify the number of color channels as well. Then, in line 3, include_top=False
omits the output layer of the neural network. This is very handy because now this model can be included as part of other architectures, which we will see in a moment. Finally, line 4 asks to include weights in the layers, which have been obtained by training the network on the ImageNet dataset.
We will now have a look at the architecture of what we’ve obtained by printing the summary vgg16_model.summary()
:
Model: "vgg16" __________________________________________________________ Layer (type) Output Shape Param # ========================================================== input_1 (InputLayer) [(None, 64, 64, 3)] 0 __________________________________________________________ block1_conv1 (Conv2D) (None, 64, 64, 64) 1792 __________________________________________________________ block1_conv2 (Conv2D) (None, 64, 64, 64) 36928 __________________________________________________________ block1_pool (MaxPooling2D) (None, 32, 32, 64) 0 __________________________________________________________ block2_conv1 (Conv2D) (None, 32, 32, 128) 73856 __________________________________________________________ block2_conv2 (Conv2D) (None, 32, 32, 128) 147584 __________________________________________________________ block2_pool (MaxPooling2D) (None, 16, 16, 128) 0 __________________________________________________________ block3_conv1 (Conv2D) (None, 16, 16, 256) 295168 __________________________________________________________ block3_conv2 (Conv2D) (None, 16, 16, 256) 590080 __________________________________________________________ block3_conv3 (Conv2D) (None, 16, 16, 256) 590080 __________________________________________________________ block3_pool (MaxPooling2D) (None, 8, 8, 256) 0 __________________________________________________________ block4_conv1 (Conv2D) (None, 8, 8, 512) 1180160 __________________________________________________________ block4_conv2 (Conv2D) (None, 8, 8, 512) 2359808 __________________________________________________________ block4_conv3 (Conv2D) (None, 8, 8, 512) 2359808 __________________________________________________________ block4_pool (MaxPooling2D) (None, 4, 4, 512) 0 __________________________________________________________ block5_conv1 (Conv2D) (None, 4, 4, 512) 2359808 __________________________________________________________ block5_conv2 (Conv2D) (None, 4, 4, 512) 2359808 __________________________________________________________ block5_conv3 (Conv2D) (None, 4, 4, 512) 2359808 __________________________________________________________ block5_pool (MaxPooling2D) (None, 2, 2, 512) 0 ========================================================== Total params: 14,714,688 Trainable params: 14,714,688 Non-trainable params: 0 __________________________________________________________
You’ll notice that there are 18 hidden layers (plus the input layer, which technically isn’t a layer as it doesn’t compute anything) and 14,714,688 parameters (weights plus bias), which are all trainable. Let’s do one thing: let’s set all the layers as Non-trainable
:
vgg16_model
.
trainable
=
False
If we now look at the summary we observe the following:
1...
2...
3...
4======================================================
5Total
params
:
14
,
714
,
688
6Trainable
params
:
0
7Non
-
trainable
params
:
14
,
714
,
688
8______________________________________________________
If we further train this model, none of the weights will be adjusted. So what’s the point then? As researchers have found out, the first layers in a neural network are always generalizing very well. In other words, the weights are not so heavily dependent on the data; they are learning generic filters and transformations, which can be applied to any type of dataset. Therefore, we can use VGG16 as a pretrained model and just add some layers behind it, which we then train with our (limited) dataset. This process is called transfer learning:
1model
=
tf
.
keras
.
Sequential
()
2model
.
add
(
vgg16_model
)
3model
.
add
(
tf
.
keras
.
layers
.
GlobalAveragePooling2D
())
4model
.
add
(
tf
.
keras
.
layers
.
Dense
(
23
,
activation
=
'softmax'
))
Here we use our existing vgg16_model
and add a pooling and a fully connected layer (which are trainable) as an output layer:
____________________________________________________________ Layer(
type
)
Output Shape Param#
============================================================
vgg16(
Model)
(
None, 2, 2, 512)
14714688 ____________________________________________________________ global_average_pooling2d_2(
(
None, 512)
0 ____________________________________________________________ dense_1(
Dense)
(
None, 23)
11799
============================================================
Total params: 14,726,487 Trainable params: 11,799 Non-trainable params: 14,714,688 _____________________________________________________________
As you can see, 11,799 trainable parameters are contributed to the model by the last two layers we’ve just added.
By the way, we can also enable and disable trainability on a layer basis:
1vgg16_model
.
layers
[
4
]
.
trainable
=
False
From here on, we can continue as usual with model.compile()
and model.fit()
.
This section was inspired by Brijesh’s blog.
Summary
Keras greatly facilitates neural network definition, training, and usage. A key factor in this is abandoning the low-level linear-algebra-based mathematical expression of neural networks and using a more abstract, layer-based expression. Keras was already very prominent before it even supported TensorFlow as an underlying execution environment, and now Keras has become the official TensorFlow high-level API. Apart from the increased community support that this brings, it allows Keras to be tightly integrated with the TensorFlow backend under the hood, further abstracting away complex TensorFlow API by integrating functionality into Keras. The next chapter will cover the most important example of such an integration, namely parallel training.
Get What's New In TensorFlow 2.x? now with the O’Reilly learning platform.
O’Reilly members experience books, live events, courses curated by job role, and more from O’Reilly and nearly 200 top publishers.