Site Search  

Teaching » Syllabus: Reinforcement Learning and Learning Control

All downloadable documents are Adobe Acrobat PDF documents. You can obtain Acrobat for free by following the link from the Adobe Icon.

Note: This syllabus will be modified continuously to accommodate the progress and interests of the course participants!

DateTopicHandouts
Sept. 3Introduction to Reinforcement LearningSlides, Sutton Book Chapters 1-5
Sept. 10Function Approximation in Reinforcement Learning,
Optimal control along trajectories: LQR, LQG and DDP
Sutton Book Chapter 8, Todorov2005
Sept. 17Research on DDP and Function Approximation for RLTassa2007, Slides
Sept. 24Research on DDP and Function Approximation in RLDoya2000, Morimoto2003
Oct., 1Gaussian Processes for Reinforcement Learning,
Value function learning along trajectories (fitted Q iteration),
Least Squares Temporal Difference Methods
Deisenroth2009, Lagoudakis2002, Ernst2005
Oct.. 8Policy Gradient Methods: REINFORCE, GPOMDP, Natural GradientsWilliams1992, Sutton2000, Peters2008, Slides
Oct.. 15Research on Policy Gradient Methods, Introduction to Path Integral MethodsTedrake2005, Bagnell2003
Oct. 22Path Integral Methods for Reinforcement LearningTheodorou2010, Todorov2009, Kober2009
Oct. 29Path Integral Methods for Reinforcement Learning (continued)Slides
Nov. 5Sketch of Planned Projects, Modular Learning ControlTedrake2009, Todorov2009
Nov. 12Inverse reinforcement learningDvijotham2009, Abbeel2009, Ratliff2009
Nov. 19Dynamic Bayesian networks for reinforcement learningToussaint2006, Vlassis2009
Dec. 3Project presentations. 

Tentative Syllabus:

  • Introduction to reinforcement learning [1]
  • Dynamic programming methods [1, 2]
  • Optimal control methods [2, 3]
  • Temporal difference methods [1]
  • Q-Learning [1]
  • Problems of value-function-based RL methods
  • Function Approximation for RL [1]
  • Incremental Function Approximation Methods for RL [4, 5]
  • Least Squares Methods [6]
  • Direct Policy Learning: REINFORCE [7]
  • Modern policy gradient methods: GPOMDP and the Policy Gradient Theo-rem [8, 9]
  • Natural Policy Gradient Methods [9]
  • Prob. Reinforcement Learning with Reward Weighted Averaging [10, 11]
  • Q-Learning on Trajectories [12]
  • Path Integral Approaches to Reinforcement Learning I [13]
  • Path Integral Approaches to Reinforcement Learning II
  • Dynamic Bayesian Networks for RL [14]
  • Gaussian Processes in Reinforcement Learning [5]

Readings:

  1. J. Boyan, "Least-squares temporal difference learning," in In Proceedings of the Sixteenth International Conference on Machine Learning: Morgan Kaufmann, 1999, pp. 49-56.

Designed by: Nerses Ohanyan & Jan Peters
Page last modified on January 12, 2012, at 05:38 PM