Reinforcement learning: Building recommender systems
Have you ever made a decision that seemed like a good idea at the time but then years later ended up being a complete mistake?
Reinforcement learning (RL) is all about making decisions that set you up for success now and then dynamically reacting to change as the decisions play out. If you work in an industry like finance, aerospace, cars, advertising, media, or social media, RL offers massive value, helping you make better portfolio decisions and better spend advertising dollars. RL can also benefit autonomous vehicles, automating processes, and much more.
Expert Matt Kirk digs into what you need to know to get started with RL. You’ll work your way through everything from Bellman equations and value iteration all the way up to deep Q networks, and you’ll leave with resources to continue learning on your own.
What you'll learnand how you can apply it
By the end of this live, handson, online course, you’ll understand:
 How to balance exploitation and exploration within a dynamic environment
 The Gittins index and other ideas on how this works in practice
 Tradeoffs between modelfree and modelbased RL algorithms
 How reinforcement learning relates to supervised learning
 Modelfree RL with value iteration (Bellman equations), Q learning, and deep Q networks (DQNs)
And you’ll be able to:
 Build a simple model using value iteration to traverse a maze
 Build a simplistic stock trader using Q learning
 Play Breakout using a DQN
 Apply value iteration, Q learning, and DQNs to dynamic updating problems
This training course is for you because...
 You’re a data scientist with a background in supervised and unsupervised learning and want to learn reinforcement learning.
 You’re a software engineer who wants to optimize an automated system over time using machine learning.
Prerequisites
 A basic understanding of supervised learning, classification, and regression
 General knowledge of optimization theory, information theory, and algebra (useful but not required)
 Experience with deep learning techniques applied to images (useful but not required)
Recommended preparation:
 Watch Supervised Learning (video, 4m 34s)
 Take A Practical Introduction to Machine Learning (live online training course with Matt Kirk)
About your instructor

Matt Kirk is a data architect, software engineer, and entrepreneur based out of Seattle, WA.
For years, he struggled to piece together his quantitative finance background with his passion for building software.
Then he discovered his affinity for solving problems with data.
Now, he helps multimillion dollar companies with their data projects. From diamond recommendation engines to marketing automation tools, he loves educating engineering teams about methods to start their big data projects.
To learn more about how you can get started with your big data project (beyond taking this class), check out matthewkirk.com for tips.
Schedule
The timeframes are only estimates and may vary according to how the class is progressing
 Q&A
 Quiz
 Break (5 minutes)
Q learning (40 minutes)
 Lecture: Value iteration with Bellman equations; rearranging value iteration to implement Q learning (learning the optimal action based on an expected Q value or terminal state value); Q learning scenarios (What to pick as a reward, what is a state, what is an action, and are the actions stochastic versus deterministic?)
 Q&A
 Quiz
QTrader using straight Q learning (25 minutes)
 Lecture: Handcoded states and actions; reward as Sharpe ratio; results
 Handson exercise: Explore QTrader—try out different learning rates, different ways of increasing episode viewing, etc.
 Break (5 minutes)
DQN (35 minutes)
 Lecture: Other ways of learning Q; Q calculation using a neural net (variations of DQNs, including Double DQN); what neural nets are good at (convolutions, max pooling, dropouts, recurrent layers) and how to roll that into DQNs
 Q&A
 Quiz
Does a DQN work better than Q learning? (25 minutes)
 Lecture: Demonstrating that the state is now amorphous, the action is still hand coded, and the reward is still the
 same
sReinforcement learning (60 minutes)
 Lecture: The reasons for RL (balance between exploration and exploitation, learn over time instead of all at once, learn the policy to utilize over just a value, and learn AI heuristics and plans); why now (Dota 2, AlphaGo, and other advancements); what RL is (autoregressive supervised learning, Bellman equations, Markov decision processes, state–action–reward–state–action (SARSA), and modelfree versus modeled); current effective RL use (hedge funds, selfdriving cars, games, and open AI)
 Group discussion: What is RL suited for in your organization?; When would you want to use a model versus be model free?; When should you optimize the policy versus the end reward?
 how results
 Handson exercises: Explore dropouts, recurrent layers, max pooling, and convolutions
Wrapup and Q&A (10 minutes)