All downloadable documents are Adobe Acrobat PDF documents. You can obtain Acrobat for free by following the link from the Adobe Icon. |

**Note: This syllabus will be modified continuously to accommodate the progress and interests of the course participants!**

Date | Topic | Handouts |

Sept. 3 | Introduction to Reinforcement Learning | Slides, Sutton Book Chapters 1-5 |

Sept. 10 | Function Approximation in Reinforcement Learning, Optimal control along trajectories: LQR, LQG and DDP | Sutton Book Chapter 8, Todorov2005 |

Sept. 17 | Research on DDP and Function Approximation for RL | Tassa2007, Slides |

Sept. 24 | Research on DDP and Function Approximation in RL | Doya2000, Morimoto2003 |

Oct., 1 | Gaussian Processes for Reinforcement Learning, Value function learning along trajectories (fitted Q iteration), Least Squares Temporal Difference Methods | Deisenroth2009, Lagoudakis2002, Ernst2005 |

Oct.. 8 | Policy Gradient Methods: REINFORCE, GPOMDP, Natural Gradients | Williams1992, Sutton2000, Peters2008, Slides |

Oct.. 15 | Research on Policy Gradient Methods, Introduction to Path Integral Methods | Tedrake2005, Bagnell2003 |

Oct. 22 | Path Integral Methods for Reinforcement Learning | Theodorou2010, Todorov2009, Kober2009 |

Oct. 29 | Path Integral Methods for Reinforcement Learning (continued) | Slides |

Nov. 5 | Sketch of Planned Projects, Modular Learning Control | Tedrake2009, Todorov2009 |

Nov. 12 | Inverse reinforcement learning | Dvijotham2009, Abbeel2009, Ratliff2009 |

Nov. 19 | Dynamic Bayesian networks for reinforcement learning | Toussaint2006, Vlassis2009 |

Dec. 3 | Project presentations. |

- Introduction to reinforcement learning [1]
- Dynamic programming methods [1, 2]
- Optimal control methods [2, 3]
- Temporal difference methods [1]
- Q-Learning [1]
- Problems of value-function-based RL methods
- Function Approximation for RL [1]
- Incremental Function Approximation Methods for RL [4, 5]
- Least Squares Methods [6]
- Direct Policy Learning: REINFORCE [7]
- Modern policy gradient methods: GPOMDP and the Policy Gradient Theo-rem [8, 9]
- Natural Policy Gradient Methods [9]
- Prob. Reinforcement Learning with Reward Weighted Averaging [10, 11]
- Q-Learning on Trajectories [12]
- Path Integral Approaches to Reinforcement Learning I [13]
- Path Integral Approaches to Reinforcement Learning II
- Dynamic Bayesian Networks for RL [14]
- Gaussian Processes in Reinforcement Learning [5]

- J. Boyan, "Least-squares temporal difference learning," in In Proceedings of the Sixteenth International Conference on Machine Learning: Morgan Kaufmann, 1999, pp. 49-56.

Retrieved from http://www-clmc.usc.edu/Teaching/CS599RLSyllabus

Page last modified on January 12, 2012, at 05:38 PM