

 All downloadable documents are Adobe Acrobat PDF documents. You can obtain Acrobat for free by following the link from the Adobe Icon. 
Note: This syllabus will be modified continuously to accommodate the progress and interests of the course participants!
Date  Topic  Handouts 
Sept. 3  Introduction to Reinforcement Learning  Slides, Sutton Book Chapters 15 
Sept. 10  Function Approximation in Reinforcement Learning, Optimal control along trajectories: LQR, LQG and DDP  Sutton Book Chapter 8, Todorov2005 
Sept. 17  Research on DDP and Function Approximation for RL  Tassa2007, Slides 
Sept. 24  Research on DDP and Function Approximation in RL  Doya2000, Morimoto2003 
Oct., 1  Gaussian Processes for Reinforcement Learning, Value function learning along trajectories (fitted Q iteration), Least Squares Temporal Difference Methods  Deisenroth2009, Lagoudakis2002, Ernst2005 
Oct.. 8  Policy Gradient Methods: REINFORCE, GPOMDP, Natural Gradients  Williams1992, Sutton2000, Peters2008, Slides 
Oct.. 15  Research on Policy Gradient Methods, Introduction to Path Integral Methods  Tedrake2005, Bagnell2003 
Oct. 22  Path Integral Methods for Reinforcement Learning  Theodorou2010, Todorov2009, Kober2009 
Oct. 29  Path Integral Methods for Reinforcement Learning (continued)  Slides 
Nov. 5  Sketch of Planned Projects, Modular Learning Control  Tedrake2009, Todorov2009 
Nov. 12  Inverse reinforcement learning  Dvijotham2009, Abbeel2009, Ratliff2009 
Nov. 19  Dynamic Bayesian networks for reinforcement learning  Toussaint2006, Vlassis2009 
Dec. 3  Project presentations.  
Tentative Syllabus:
 Introduction to reinforcement learning [1]
 Dynamic programming methods [1, 2]
 Optimal control methods [2, 3]
 Temporal difference methods [1]
 QLearning [1]
 Problems of valuefunctionbased RL methods
 Function Approximation for RL [1]
 Incremental Function Approximation Methods for RL [4, 5]
 Least Squares Methods [6]
 Direct Policy Learning: REINFORCE [7]
 Modern policy gradient methods: GPOMDP and the Policy Gradient Theorem [8, 9]
 Natural Policy Gradient Methods [9]
 Prob. Reinforcement Learning with Reward Weighted Averaging [10, 11]
 QLearning on Trajectories [12]
 Path Integral Approaches to Reinforcement Learning I [13]
 Path Integral Approaches to Reinforcement Learning II
 Dynamic Bayesian Networks for RL [14]
 Gaussian Processes in Reinforcement Learning [5]
Readings:





 J. Boyan, "Leastsquares temporal difference learning," in In Proceedings of the Sixteenth International Conference on Machine Learning: Morgan Kaufmann, 1999, pp. 4956.









