|
|
 | All downloadable documents are Adobe Acrobat PDF documents. You can obtain Acrobat for free by following the link from the Adobe Icon. |
Note: This syllabus will be modified continuously to accommodate the progress and interests of the course participants!
| Date | Topic | Handouts |
| Sept. 3 | Introduction to Reinforcement Learning | Slides, Sutton Book Chapters 1-5 |
| Sept. 10 | Function Approximation in Reinforcement Learning, Optimal control along trajectories: LQR, LQG and DDP | Sutton Book Chapter 8, Todorov2005 |
| Sept. 17 | Research on DDP and Function Approximation for RL | Tassa2007, Slides |
| Sept. 24 | Research on DDP and Function Approximation in RL | Doya2000, Morimoto2003 |
| Oct., 1 | Gaussian Processes for Reinforcement Learning, Value function learning along trajectories (fitted Q iteration), Least Squares Temporal Difference Methods | Deisenroth2009, Lagoudakis2002, Ernst2005 |
| Oct.. 8 | Policy Gradient Methods: REINFORCE, GPOMDP, Natural Gradients | Williams1992, Sutton2000, Peters2008, Slides |
| Oct.. 15 | Research on Policy Gradient Methods, Introduction to Path Integral Methods | Tedrake2005, Bagnell2003 |
| Oct. 22 | Path Integral Methods for Reinforcement Learning | Theodorou2010, Todorov2009, Kober2009 |
| Oct. 29 | Path Integral Methods for Reinforcement Learning (continued) | Slides |
| Nov. 5 | Sketch of Planned Projects, Modular Learning Control | Tedrake2009, Todorov2009 |
| Nov. 12 | Inverse reinforcement learning | Dvijotham2009, Abbeel2009, Ratliff2009 |
| Nov. 19 | Dynamic Bayesian networks for reinforcement learning | Toussaint2006, Vlassis2009 |
| Dec. 3 | Project presentations. | |
Tentative Syllabus:
- Introduction to reinforcement learning [1]
- Dynamic programming methods [1, 2]
- Optimal control methods [2, 3]
- Temporal difference methods [1]
- Q-Learning [1]
- Problems of value-function-based RL methods
- Function Approximation for RL [1]
- Incremental Function Approximation Methods for RL [4, 5]
- Least Squares Methods [6]
- Direct Policy Learning: REINFORCE [7]
- Modern policy gradient methods: GPOMDP and the Policy Gradient Theo-rem [8, 9]
- Natural Policy Gradient Methods [9]
- Prob. Reinforcement Learning with Reward Weighted Averaging [10, 11]
- Q-Learning on Trajectories [12]
- Path Integral Approaches to Reinforcement Learning I [13]
- Path Integral Approaches to Reinforcement Learning II
- Dynamic Bayesian Networks for RL [14]
- Gaussian Processes in Reinforcement Learning [5]
Readings:
Sutton, R. S.;Barto, A. G. (1998). Reinforcement learning : An introduction, Adaptive computation and machine learning., pp.xviii, 322, MIT Press.
[Keywords: reinforcement learning (machine learning)]
[Detail] [BibTeX]
Dyer, P.;McReynolds, S. R. (1970). The computation and theory of optimal control, Academic Press.
[Keywords: dynamic programming,optimal control]
[Detail] [BibTeX]
Evangelos A. Theodorou, Yuval Tassa, Emo Todorov, (submitted). Stochastic Differential Dynamic Programming. (MANUSCRIPT UNDER REVIEW. SUGGESTIONS WELCOME).
[Keywords: stochastic differential dynamics programming,second order optimal control]
[Detail] [BibTeX] [PDF]
Morimoto, J.;Atkeson, C. A. (2003). Minimax differential dynamic programming: an application to robust biped walking, in: Becker, S.;Thrun, S.;Obermayer, K. (eds.), Advances in Neural Information Processing Systems 15, Cambridge, MA: MIT Press.
[Keywords: reinforcement learning
trajectory optimization
differential dynamic programming]
[Detail] [BibTeX] [PDF]
Schaal, S.;Atkeson, C. G. (1998). Constructive incremental learning from only local information, Neural Computation, 10, 8, pp.2047-2084.
[Keywords: statistical learning, nonparametric regression, distance metric, incremental learning, on-line learning, supersmoothing]
[Detail] [BibTeX] [PDF]
Rasmussen, C. E.;Williams, C. K. I. (2006). Gaussian processes for machine learning, Adaptive computation and machine learning, pp.xviii, 248 p., MIT Press.
[Keywords: gaussian processes data processing.
machine learning mathematical models.]
[Detail] [BibTeX] [PDF]
- J. Boyan, "Least-squares temporal difference learning," in In Proceedings of the Sixteenth International Conference on Machine Learning: Morgan Kaufmann, 1999, pp. 49-56.
Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, 8, pp.229-256.
[Keywords: stochastic reinforcement learning, non delayed]
[Detail] [BibTeX]
Peters, J.;Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients, Neural Networks, 21, 4, pp.682-97.
[Keywords: reinforcement learning, policy gradient methods, natural gradients, natural actor-critic, motor skills, motor primitives]
[Detail] [BibTeX] [PDF]
Peters, J.;Schaal, S. (2008). Natural actor critic, Neurocomputing, 71, 7-9, pp.1180-1190.
[Keywords: reinforcement learning, policy gradient, natural actor-critic, natural gradients]
[Detail] [BibTeX] [PDF]
Peters, J.;Schaal, S. (2007). Reinforcement learning by reward-weighted regression for operational space control, Proceedings of the International Conference on Machine Learning (ICML2007).
[Keywords: reinforcement learning, operational space control, weighted regression]
[Detail] [BibTeX] [PDF]
Kober, J.; Peters, J. (2009). Policy Search for Motor Primitives in Robotics, Advances in Neural Information Processing Systems 22 (NIPS 2008), Cambridge, MA: MIT Press.
[Detail] [BibTeX] [PDF]
Neumann, G.; Peters, J. (2009). Fitted Q-iteration by Advantage Weighted Regression, Advances in Neural Information Processing Systems 22 (NIPS 2008), Cambridge, MA: MIT Press.
[Detail] [BibTeX] [PDF]
Evangelos A. Theodorou.;Buchli, J.;Schaal, S. (2009). Path integral stochastic optimal control for rigid body dynamics, IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL2009).
[Keywords: reinforcement learning, optimal control, path integrals, stochastic systems]
[Detail] [BibTeX] [PDF]
Toussaint, M.;Storkey, A. (2006). Probabilistic inference for solving discrete and continuous state Markov Decision Processes, 23nd International Conference on Machine Learning (ICML 2006).
[Detail] [BibTeX]
|
|
|
|
Page last modified on November 19, 2009, at 05:02 PM
|
|