book

机器学习实战：基于Scikit-Learn、Keras 和TensorFlow （原书第2 版）

by Aurélien Géron

October 2020

Intermediate to advanced

693 pages

16h 26m

Chinese

China Machine Press

Read now

Unlock full access

Content preview from 机器学习实战：基于Scikit-Learn、Keras 和TensorFlow （原书第2 版）

训练深度神经网络

｜

315

公式 11-8：Adam 算法

←

–（1

–

）

∆

(

)

←

+（1

–

）

∆

(

)

∆

(

)

←

–

←

–

←

–

ηm

在此等式中，

表示迭代次数（从 1 开始）。

如果只看步骤 1、2 和 5，你会发现 Adam 与动量优化和 RMSProp 非常相似。唯一的区

别是步骤 1 计算的是指数衰减的平均值，而不是指数衰减的总和，但除了常数因子（衰

减平均值是衰减总和的 1–

倍）外，它们实际上是等效的。第 3 步和第 4 步在技术上有

些细节：由于

和

初始化为 0，因此在训练开始时它们会偏向 0，这两个步骤将有助

于在训练开始时提高

和

。

动量衰减超参数

通常被初始化为 0.9，而缩放衰减超参数

通常被初始化为 0.999。

如前所述，平滑项

通常会初始化为一个很小的数字，例如 10

–7

。这些是 Adam 类的

默认值（准确地说，epsilon 的默认值为 None，它告诉 Keras 使用 keras.backend.

epsilon()，默认值为 10

–7

。你可以使用 keras.backend.set_epsilon() 来改

变）。这是使用 Keras 来创建 Adam 优化器的方法：

optimizer = keras.optimizers.Adam(lr=0.001, beta_1=0.9, beta_2=0.999)

由于 Adam 是一种自适应学习率算法（如 AdaGrad ...

Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Start your free trial

算法技术手册（原书第2 版）

George T.Heineman, Gary Pollice, Stanley Selkow

Go语言编程

威廉·肯尼迪

数据库系统内幕

Alex Petrov

管理Kubernetes

Brendan Burns, Craig Tracey

Publisher Resources

ISBN: 9787111665977