Skip to Content
精通機器學習
book

精通機器學習

by Aurélien Géron
April 2020
Intermediate to advanced
816 pages
18h 32m
Chinese
GoTop Information, Inc.
Content preview from 精通機器學習
324
|
第十一章:訓練深度神經網路
梯度消失
/
爆炸問題
10
章談過
反向傳播演算法的作法是從輸出層往輸入層移動
沿路傳播誤差梯度
演算法計算代價函數對各個網路參數的梯度之後
它會使用這些梯度以及梯度下降步驟來
更改各個參數
遺憾的是
梯度會隨著演算法移往低層而越來越小
因此
這種利用梯度下降來修改參數
的做法幾乎不會改變較低層的連結權重
所以訓練程序永遠不會收斂到一個好的解
我們
將這種現象稱為
梯度消失
問題
有時也會出現相反的情況
梯度可能越來越大
最後階
層被改成異常大的權重
造成演算法發散
這稱為
梯度爆炸
問題
可能在遞迴神經網路
中發生
見第
15
)。
更普遍的是
深度神經網路有不穩定的梯度
不同的階層可能有差
異極大的學習速度
這種不幸的行為在很久以前就被發現了
它也是深度神經網路在
2000
年初期被多數人
捨棄的原因之一
目前大家還不知道為何訓練
DNN
梯度如此不穩定
但是
Xavier
Glorot
Yoshua Bengio
2010
年發表的一篇論文
https://homl.info/47
讓我們看到一線
曙光
1
作者們發現了一些嫌犯
包括當時最流行的
logistic sigmoid
觸發函數和權重初始
化技術的組合
亦即
使用均值為
0
標準差為
1
的一般分布
)。
簡而言之
他們指出
使
用這種觸發函數和這種初始化方法時
各層的輸出的變異度比它的輸入的變異度大很多
在網路中順向前進時
變異度會在經歷每一層之後不斷增加
直到最頂層的觸發函數飽和
為止
因為
logistic
函式的均值是
0.5
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

下一代空间计算:AR与VR创新理论与实践

下一代空间计算:AR与VR创新理论与实践

Erin Pangilinan, Steve Lukas, Vasanth Mohan
C语言核心技术(原书第2版)

C语言核心技术(原书第2版)

Peter Prinz, Tony Crawford

Publisher Resources

ISBN: 9789865024345