Skip to Content
精通機器學習
book

精通機器學習

by Aurélien Géron
April 2020
Intermediate to advanced
816 pages
18h 32m
Chinese
GoTop Information, Inc.
Content preview from 精通機器學習
600
|
第十八章:強化學習
很多研究人員都試著找出即使最初對環境一無所知
最後仍然有很好的表
現的
agent
但是除非你正在撰寫論文
否則都應該毫不猶豫地將先驗知識
注入
agent
因為這會大幅提升訓練速度
例如
因為你知道桿子應該盡量
維持直立
你可以加入與桿子的角度成正比的負獎勵
這可讓獎勵稠密許
並提升訓練速度
此外
如果你已經有很好的策略
例如寫死的
),
你或許可以先訓練神經網路來模擬它
再使用策略梯度來改善它
我們剛才訓練的策略梯度演算法解決了
CartPole
任務
但它無法很好地擴展來處理更大
規模且更複雜的任務
事實上
它的
樣本效率很低
sample inefficient),
也就是說
它必
須花很長的時間來探索遊戲
才能取得顯著的進展
原因是它必須執行很多期來估計各
個行動的優勢
接下來會進一步探討
但是
有一些更強大的演算法將它當成基礎
例如
Actor
-
Critic
演算法
本章結尾將簡單地討論它
)。
我們接下來要介紹另一種熱門的演算法族群
PG
演算法是直接試著優化策略來提升獎
但我們接下來要看的演算法沒那麼直接
agent
會先學習估計各個狀態的期望回報
或是估計各個狀態之下的各個行動的期望回報
再使用這個知識來決定如何行動
為了瞭
解這個演算法
我們必須先介紹
馬可夫決策過程
馬可夫決策過程
數學家
Andrey Markov
20
世紀初期研究無記憶的隨機過程
稱為
馬可夫鏈
Markov
chains),
這種過程有固定數量的狀態
它會在每一步隨機從一種狀態演變成另一種狀態
它從狀態 ...
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

下一代空间计算:AR与VR创新理论与实践

下一代空间计算:AR与VR创新理论与实践

Erin Pangilinan, Steve Lukas, Vasanth Mohan
C语言核心技术(原书第2版)

C语言核心技术(原书第2版)

Peter Prinz, Tony Crawford

Publisher Resources

ISBN: 9789865024345