This course is an introduction to sequential decision making and reinforcement learning. We start with a discussion of utility theory to learn how preferences can be represented and modeled for decision making. We first model simple decision problems as multi-armed bandit problems in and discuss several approaches to evaluate feedback. We will then model decision problems as finite Markov decision processes (MDPs), and discuss their solutions via dynamic programming algorithms. We touch on the notion of partial observability in real problems, modeled by POMDPs and then solved by online planning methods. Finally, we introduce the reinforcement learning problem and discuss two paradigms: Monte Carlo methods and temporal difference learning. We conclude the course by noting how the two paradigms lie on a spectrum of n-step temporal difference methods. An emphasis on algorithms and examples will be a key part of this course.
von


Decision Making and Reinforcement Learning
Columbia UniversityÜber diesen Kurs
Introductory computer science and data structures. Familiarity with the Python. Familiarity with basic probability and optimization.
Was Sie lernen werden
Map between qualitative preferences and appropriate quantitative utilities.
Model non-associative and associative sequential decision problems with multi-armed bandit problems and Markov decision processes respectively
Implement dynamic programming algorithms to find optimal policies
Implement basic reinforcement learning algorithms using Monte Carlo and temporal difference methods
Kompetenzen, die Sie erwerben
- Deep Learning
- Markov Decision Process
- Machine Learning
- Reinforcement Learning
- Monte Carlo Method
Introductory computer science and data structures. Familiarity with the Python. Familiarity with basic probability and optimization.
Lehrplan - Was Sie in diesem Kurs lernen werden
Decision Making and Utility Theory
Bandit Problems
Markov Decision Processes
Dynamic Programming
Häufig gestellte Fragen
Wann erhalte ich Zugang zu den Vorträgen und Aufgaben?
Was bekomme ich, wenn ich das Zertifikat erwerbe?
Ist finanzielle Unterstützung möglich?
Is financial aid available?
Haben Sie weitere Fragen? Besuchen Sie das Learner Help Center.