Site Search   view print login

Teaching » Syllabus: Reinforcement Learning and Learning Control

All downloadable documents are Adobe Acrobat PDF documents. You can obtain Acrobat for free by following the link from the Adobe Icon.

Note: This syllabus will be modified continuously to accommodate the progress and interests of the course participants!

DateTopicHandouts
Sept. 3Introduction to Reinforcement LearningSlides, Sutton Book Chapters 1-5
Sept. 10Function Approximation in Reinforcement Learning,
Optimal control along trajectories: LQR, LQG and DDP
Sutton Book Chapter 8, Todorov2005
Sept. 17Research on DDP and Function Approximation for RLTassa2007, Slides
Sept. 24Research on DDP and Function Approximation in RLDoya2000, Morimoto2003
Oct., 1Gaussian Processes for Reinforcement Learning,
Value function learning along trajectories (fitted Q iteration),
Least Squares Temporal Difference Methods
Deisenroth2009, Lagoudakis2002, Ernst2005
Oct.. 8Policy Gradient Methods: REINFORCE, GPOMDP, Natural GradientsWilliams1992, Sutton2000, Peters2008, Slides
Oct.. 15Research on Policy Gradient Methods, Introduction to Path Integral MethodsTedrake2005, Bagnell2003
Oct. 22Path Integral Methods for Reinforcement LearningTheodorou2010, Todorov2009, Kober2009
Oct. 29Path Integral Methods for Reinforcement Learning (continued)Slides
Nov. 5Sketch of Planned Projects, Modular Learning ControlTedrake2009, Todorov2009
Nov. 12Inverse reinforcement learningDvijotham2009, Abbeel2009, Ratliff2009
Nov. 19Dynamic Bayesian networks for reinforcement learningToussaint2006, Vlassis2009
Dec. 3Project presentations. 

Tentative Syllabus:

  • Introduction to reinforcement learning [1]
  • Dynamic programming methods [1, 2]
  • Optimal control methods [2, 3]
  • Temporal difference methods [1]
  • Q-Learning [1]
  • Problems of value-function-based RL methods
  • Function Approximation for RL [1]
  • Incremental Function Approximation Methods for RL [4, 5]
  • Least Squares Methods [6]
  • Direct Policy Learning: REINFORCE [7]
  • Modern policy gradient methods: GPOMDP and the Policy Gradient Theo-rem [8, 9]
  • Natural Policy Gradient Methods [9]
  • Prob. Reinforcement Learning with Reward Weighted Averaging [10, 11]
  • Q-Learning on Trajectories [12]
  • Path Integral Approaches to Reinforcement Learning I [13]
  • Path Integral Approaches to Reinforcement Learning II
  • Dynamic Bayesian Networks for RL [14]
  • Gaussian Processes in Reinforcement Learning [5]

Readings:

  1. Sutton, R. S.;Barto, A. G. (1998). Reinforcement learning : An introduction, Adaptive computation and machine learning., pp.xviii, 322, MIT Press.
    [Keywords: reinforcement learning (machine learning)]
    [Detail] [BibTeX]

  2. Dyer, P.;McReynolds, S. R. (1970). The computation and theory of optimal control, Academic Press.
    [Keywords: dynamic programming,optimal control]
    [Detail] [BibTeX]

  3. Evangelos A. Theodorou, Yuval Tassa, Emo Todorov, (submitted). Stochastic Differential Dynamic Programming. (MANUSCRIPT UNDER REVIEW. SUGGESTIONS WELCOME).
    [Keywords: stochastic differential dynamics programming,second order optimal control]
    [Detail] [BibTeX] [PDF]

    Morimoto, J.;Atkeson, C. A. (2003). Minimax differential dynamic programming: an application to robust biped walking, in: Becker, S.;Thrun, S.;Obermayer, K. (eds.), Advances in Neural Information Processing Systems 15, Cambridge, MA: MIT Press.
    [Keywords: reinforcement learning trajectory optimization differential dynamic programming]
    [Detail] [BibTeX] [PDF]

  4. Schaal, S.;Atkeson, C. G. (1998). Constructive incremental learning from only local information, Neural Computation, 10, 8, pp.2047-2084.
    [Keywords: statistical learning, nonparametric regression, distance metric, incremental learning, on-line learning, supersmoothing]
    [Detail] [BibTeX] [PDF]

  5. Rasmussen, C. E.;Williams, C. K. I. (2006). Gaussian processes for machine learning, Adaptive computation and machine learning, pp.xviii, 248 p., MIT Press.
    [Keywords: gaussian processes data processing. machine learning mathematical models.]
    [Detail] [BibTeX] [PDF]

  6. J. Boyan, "Least-squares temporal difference learning," in In Proceedings of the Sixteenth International Conference on Machine Learning: Morgan Kaufmann, 1999, pp. 49-56.

  7. Williams, R. J. (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning, Machine Learning, 8, pp.229-256.
    [Keywords: stochastic reinforcement learning, non delayed]
    [Detail] [BibTeX]

  8. Peters, J.;Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients, Neural Networks, 21, 4, pp.682-97.
    [Keywords: reinforcement learning, policy gradient methods, natural gradients, natural actor-critic, motor skills, motor primitives]
    [Detail] [BibTeX] [PDF]

  9. Peters, J.;Schaal, S. (2008). Natural actor critic, Neurocomputing, 71, 7-9, pp.1180-1190.
    [Keywords: reinforcement learning, policy gradient, natural actor-critic, natural gradients]
    [Detail] [BibTeX] [PDF]

  10. Peters, J.;Schaal, S. (2007). Reinforcement learning by reward-weighted regression for operational space control, Proceedings of the International Conference on Machine Learning (ICML2007).
    [Keywords: reinforcement learning, operational space control, weighted regression]
    [Detail] [BibTeX] [PDF]

  11. Kober, J.; Peters, J. (2009). Policy Search for Motor Primitives in Robotics, Advances in Neural Information Processing Systems 22 (NIPS 2008), Cambridge, MA: MIT Press.
    [Detail] [BibTeX] [PDF]

  12. Neumann, G.; Peters, J. (2009). Fitted Q-iteration by Advantage Weighted Regression, Advances in Neural Information Processing Systems 22 (NIPS 2008), Cambridge, MA: MIT Press.
    [Detail] [BibTeX] [PDF]

  13. Evangelos A. Theodorou.;Buchli, J.;Schaal, S. (2009). Path integral stochastic optimal control for rigid body dynamics, IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning (ADPRL2009).
    [Keywords: reinforcement learning, optimal control, path integrals, stochastic systems]
    [Detail] [BibTeX] [PDF]

  14. Toussaint, M.;Storkey, A. (2006). Probabilistic inference for solving discrete and continuous state Markov Decision Processes, 23nd International Conference on Machine Learning (ICML 2006).
    [Detail] [BibTeX]

Page last modified on November 19, 2009, at 05:02 PM