Skip to Content
深度學習|內行人的做法
book

深度學習|內行人的做法

by Josh Patterson, Adam Gibson
January 2019
Beginner to intermediate
576 pages
14h 31m
Chinese
GoTop Information, Inc.
Content preview from 深度學習|內行人的做法
451
附錄 B
RL4J 與強化學習
Ruben Fiszel
http://rubenfiszel.github.io/
序言
本附錄一開始先介紹「強化學習(reinforcement learning)」,然後再詳細說明以像素為
輸入的深度 Q 網路(DQN, Deep Q-Networks),最後展示一個 RL4J 範例來做個小結。
我們先來看看強化學習的核心概念。
強化學習是機器學習領域中一個令人興奮的子領域。基本上,它指的是在特定環境中
學習高效的策略。如果用比較非正式的說法,它其實與「帕夫洛夫制約」(Pavlovian
conditioning,又稱「古典制約」)非常相似:如果你針對某個行為給予獎勵,經過一段
時間之後,代理者就能學會重複這樣的行為,以獲得更多的獎勵。
馬可夫決策過程
從形式上來看,前面所提到的環境,可以用所謂的「馬可夫決策過程(MDP, Markov
Decision Process)」來加以定義。這個聽起來很可怕的名字,其實就是以下(五元組)
所構成的組合:
一組狀態(StateS:例如在西洋棋中,狀態指的就是各種棋局下每個棋子的位置。
一組可能的動作(ActionA:在西洋棋中,指的就是每一種棋局下每一種可能的動
作(例如從 e4 移動到 e5)。
452
Become an O’Reilly member and get unlimited access to this title plus top books and audiobooks from O’Reilly and nearly 200 top publishers, thousands of courses curated by job role, 150+ live events each month,
and much more.

Read now

Unlock full access

More than 5,000 organizations count on O’Reilly

AirBnbBlueOriginElectronic ArtsHomeDepotNasdaqRakutenTata Consultancy Services

QuotationMarkO’Reilly covers everything we've got, with content to help us build a world-class technology community, upgrade the capabilities and competencies of our teams, and improve overall team performance as well as their engagement.
Julian F.
Head of Cybersecurity
QuotationMarkI wanted to learn C and C++, but it didn't click for me until I picked up an O'Reilly book. When I went on the O’Reilly platform, I was astonished to find all the books there, plus live events and sandboxes so you could play around with the technology.
Addison B.
Field Engineer
QuotationMarkI’ve been on the O’Reilly platform for more than eight years. I use a couple of learning platforms, but I'm on O'Reilly more than anybody else. When you're there, you start learning. I'm never disappointed.
Amir M.
Data Platform Tech Lead
QuotationMarkI'm always learning. So when I got on to O'Reilly, I was like a kid in a candy store. There are playlists. There are answers. There's on-demand training. It's worth its weight in gold, in terms of what it allows me to do.
Mark W.
Embedded Software Engineer

You might also like

高效能網站建置指南

高效能網站建置指南

Steve Souders
初探深度學習|使用TensorFlow

初探深度學習|使用TensorFlow

Reza Zadeh, Bharath Ramsundar
深度学习实战

深度学习实战

Douwe Osinga

Publisher Resources

ISBN: 9789865020262