Reinforcement Learning for Robotics and Computational Motor Control
While supervised statistical learning techniques have significant applications for model and imitation learning, they do not suffice for all motor learning problems, particularly when no expert teacher or idealized desired behavior is available. Thus, both robotics and the understanding of human motor control require reward (or cost) related self-improvement.
The developement of efficient reinforcement learning methods is therefore essential for the success of learning in motor control.
However, reinforcement learning in high-dimensional spaces such as manipulator and humanoid robotics is extremely difficult as a complete exploration of the underlying state-action spaces is impossible and few existing techniques scale into this domain.
Nevertheless, it is obvious that humans also never need such an extensive exploration in order to learn new motor skills and instead rely upon a combination of both watching a teacher and subsequent self-improvement. In more technical terms: first, a control policy is obtained by imitation and then improved using reinforcement learning. It is essential that only local policy search techniques, e.g., policy gradient methods, are applied as a rapid change to the policy would result into a complete unlearning of the policy and might also result into an unstable control policies which can damage the robot.
In order to bring reinforcement learning to robotics and computational motor control, we have developed a variety of novel reinforcement learning algorithms, such as the Natural Actor-Critic and the Episodic Natural Actor-Critic. These methods are particularly well-suited for policies based upon motor primitives and are being applied to motor skill learning in humanoid robotics and legged locomotion.
Contact persons: Jan Peters, Stefan Schaal
Related Publications
Kalakrishnan, M.; Pastor, P.; Righetti, L.; Schaal, S. (2013). Learning Objective Functions for Manipulation, IEEE International Conference on Robotics and Automation.
[Keywords: learning, inverse reinforcement learning, manipulation, grasping, inverse kinematics, motion planning, trajectory optimization]
[Detail] [BibTeX] [PDF]
Kalakrishnan, M.; Righetti, L.; Pastor, P.; Schaal, S. (2012). Learning Force Control Policies for Compliant Robotic Manipulation, International Conference on Machine Learning (ICML).
[Keywords: movement primitives, reinforcement learning, pi2, skill learning, force control]
[Detail] [BibTeX] [PDF]
Stulp, F.;Theodorou, E.;Buchli, J.;Schaal, S. (2011). Learning to grasp under uncertainty, Robotics and Automation (ICRA), 2011 IEEE International Conference on.
[Keywords: reinforcement learning, stochasticity, motor primitives, skills]
[Detail] [BibTeX] [PDF]
Pastor, P.;Kalakrishnan, M.;Chitta, S.;Theodorou, E.;Schaal, S. (2011). Skill learning and task outcome prediction for manipulation, Robotics and Automation (ICRA), 2011 IEEE International Conference on.
[Keywords: movement primitives, reinforcement learning, sensory data mining, motor skills]
[Detail] [BibTeX] [PDF]
Kalakrishnan, M.;Chitta, S.;Theodorou, E.;Pastor, P.;Schaal, S. (2011). STOMP: Stochastic trajectory optimization for motion planning, Robotics and Automation (ICRA), 2011 IEEE International Conference on.
[Keywords: reinforcement learning, optimization, optimal motion planning]
[Detail] [BibTeX] [PDF]
Kalakrishnan, M.;Righetti, L.;Pastor, P.;Schaal, S. (2011). Learning force control policies for compliant manipulation, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011).
[Keywords: movement primitives, reinforcement learning, pi2, skill learning, force control]
[Detail] [BibTeX] [PDF]
Stulp, F.;Theodorou, E.;Kalakrishnan, M.;Pastor, P.;Righetti, L.;Schaal, S. (2011). Learning motion primitive goals for robust manipulation, IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2011).
[Keywords: movement primitives, reinforcement learning, pi2, skill learning]
[Detail] [BibTeX] [PDF]
Theodorou, E. A., Buchli, J., Schaal, S. (2010). Learning Policy Improvements with Path Integrals, International Conference on Artificial Intelligence and Statistics (AISTATS 2010).
[Keywords: reinforcement learning, optimal control, pi2]
[Detail] [BibTeX] [PDF]
Theodorou, E.;Buchli, J.;Stulp, F.;Schaal, S. (2010). An iterative path integral reinforcement learning approach, Snowbird Learning Workshop.
[Keywords: reinforcement learning, optimal control, path integral, motor control]
[Detail] [BibTeX] [PDF]
Schaal, S.;Atkeson, C. G. (2010). Learning control in robotics -- trajectory-based opitimal control techniques, Robotics and Automation Magazine, 17, 2, pp.20-29.
[Keywords: reinforcement learning, optimal control, imitiation learning, review]
[Detail] [BibTeX] [PDF]
Buchli, J.;Theodorou, E.;Stulp, F.;Schaal, S. (2010). Variable impedance control - a reinforcement learning approach, Robotics Science and Systems (2010).
[Keywords: reinforcement learning, optimal control, pi2]
[Detail] [BibTeX] [PDF]
Theodorou, E.;Hoffmann, H.;Mistry, M.;Schaal, S. (2008). Computational model for movement learning under uncertain cost, Abstracts of the Society of Neuroscience Meeting (SFN 2008).
[Keywords: computational motor control, motor planning, optimization, reinforcement learning]
[Detail] [BibTeX]
Hoffmann, H.;Theodorou, E.;Schaal, S. (2008). Behavioral experiments on reinforcement learning in human motor control, Abstracts of the Eighteenth Annual Meeting of Neural Control of Movement (NCM).
[Keywords: computational motor control, optimal control, reinforcement learning]
[Detail] [BibTeX]
Heiko Hoffmann, Evangelos Theodorou, and Stefan Schaal (2008). Optimization strategies in human reinforcement learning, Advances in Computational Motor Control VII, Symposium at the Society for Neuroscience Meeting, Washington DC, 2008.
[Keywords: reinforcement learning, motor control, psychophysics]
[Detail] [BibTeX] [PDF]
Peters, J.;Schaal, S. (2008). Natural actor critic, Neurocomputing, 71, 7-9, pp.1180-1190.
[Keywords: reinforcement learning, policy gradient, natural actor-critic, natural gradients]
[Detail] [BibTeX] [PDF]
Peters, J.;Schaal, S. (2008). Learning to control in operational space, International Journal of Robotics Research, 27, pp.197-212.
[Keywords: operational space control, learning, em algorithm, redundancy resolution, reinforcement learning]
[Detail] [BibTeX] [PDF]
Peters, J.;Schaal, S. (2008). Reinforcement learning of motor skills with policy gradients, Neural Networks, 21, 4, pp.682-97.
[Keywords: reinforcement learning, policy gradient methods, natural gradients, natural actor-critic, motor skills, motor primitives]
[Detail] [BibTeX] [PDF]
Peters, J., Schaal, S. (2007). Policy Learning for Motor Skills, Proceedings of 14th International Conference on Neural Information Processing (ICONIP).
[Keywords: machine learning, reinforcement learning, robotics, motor primitives, policy gradients, natural actor-critic, reward-weighted regression]
[Detail] [BibTeX] [PDF]
Theodorou, E; Peters, J.; Schaal, S. (2007). Reinforcement Learning for Optimal Control of Arm Movements, Abstracts of the 37st Meeting of the Society of Neuroscience..
[Keywords: optimal control,reinforcement learning, arm movements]
[Detail] [BibTeX]
Peters, J. (2007). Machine Learning of Motor Skills for Robotics, Ph.D. Thesis, Department of Computer Science, University of Southern California.
[Keywords: machine learning, reinforcement learning, robotics, motor primitives, policy gradients, natural actor-critic, reward-weighted regression]
[Detail] [BibTeX]
Peters, J.;Schaal, S. (2007). Reinforcement learning for operational space control, International Conference on Robotics and Automation (ICRA2007), pp.2111-2116.
[Keywords: operational space control, reinforcement learning, weighted regression, em-algorithm]
[Detail] [BibTeX] [PDF]
Peters, J.;Schaal, S. (2007). Using reward-weighted regression for reinforcement learning of task space control, Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[Keywords: reinforcement learning, cart-pole, policy gradient methods]
[Detail] [BibTeX] [PDF]
Peters, J.;Schaal, S. (2007). Applying the episodic natural actor-critic architecture to motor primitive learning, Proceedings of the 2007 European Symposium on Artificial Neural Networks (ESANN).
[Keywords: reinforcement learning, policy gradient methods, motor primitives, natural actor-critic]
[Detail] [BibTeX] [PDF]
Peters, J.;Schaal, S. (2007). Reinforcement learning by reward-weighted regression for operational space control, Proceedings of the International Conference on Machine Learning (ICML2007).
[Keywords: reinforcement learning, operational space control, weighted regression]
[Detail] [BibTeX] [PDF]
Peters, J.;Theodorou, E.;Schaal, S. (2007). Policy gradient methods for machine learning, INFORMS Conference of the Applied Probability Society.
[Keywords: policy gradient methods, reinforcement learning, simulation-optimization]
[Detail] [BibTeX]
Riedmiller, M.;Peters, J.;Schaal, S. (2007). Evaluation of policy gradient methods and variants on the cart-pole benchmark, Proceedings of the 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[Keywords: reinforcement learning, cart-pole, policy gradient methods]
[Detail] [BibTeX] [PDF]
Peters, J.;Schaal, S. (2006). Reinforcement Learning for Parameterized Motor Primitives, Proceedings of the 2006 International Joint Conference on Neural Networks (IJCNN 2006).
[Keywords: motor primitives, reinforcement learning]
[Detail] [BibTeX] [PDF]
Peters, J.;Schaal, S. (2006). Policy gradient methods for robotics, Proceedings of the IEEE International Conference on Intelligent Robotics Systems (IROS 2006).
[Keywords: policy gradient methods, reinforcement learning, robotics]
[Detail] [BibTeX] [PDF]
Peters, J.;Vijayakumar, S.;Schaal, S. (2005). Natural Actor-Critic, in: Gama, J.;Camacho, R.;Brazdil, P.;Jorge, A.;Torgo, L. (eds.), Proceedings of the 16th European Conference on Machine Learning (ECML 2005), 3720, pp.280-291, Springer.
[Keywords: reinforcement learning, policy gradients, natural gradients]
[Detail] [BibTeX] [PDF]
Peters, J.;Vijayakumar, S.;Schaal, S. (2003). Reinforcement learning for humanoid robotics, IEEE-RAS International Conference on Humanoid Robots (Humanoids2003).
[Keywords: reinforcement learning, policy gradients, movement primitives, behaviors, dynamic systems, humanoid robotics]
[Detail] [BibTeX] [PDF]
Peters, J.;Vijayakumar, S.;Schaal, S. (2003). Scaling reinforcement learning paradigms for motor learning, Proceedings of the 10th Joint Symposium on Neural Computation (JSNC 2003).
[Keywords: reinforcement learning, neurodynamic programming, actorcritic methods, policy gradient methods, natural policy gradient]
[Detail] [BibTeX] [PDF]
Schaal, S. (2002). Learning robot control, in: Arbib, M. A. (eds.), The handbook of brain theory and neural networks, 2nd Edition, pp.983-987, MIT Press.
[Keywords: robot learning, review, reinforcement learning, supervised learning, real-time learning]
[Detail] [BibTeX] [PDF]
Atkeson, C. G.;Moore, A. W.;Schaal, S. (1997). Locally weighted learning for control, Artificial Intelligence Review, 11, 1-5, pp.75-113.
[Keywords: statistical learning, nonparametric regression, distance metric, lazy learning, learning control, reinforcement learning]
[Detail] [BibTeX] [PDF]
Atkeson, C. G.;Schaal, S. (1997). Robot learning from demonstration, in: Fisher Jr., D. H. (eds.), Machine Learning: Proceedings of the Fourteenth International Conference (ICML '97), pp.12-20, Morgan Kaufmann.
[Keywords: imitation learning, reinforcement learning, dynamic programming, motor skills]
[Detail] [BibTeX] [PDF]
Atkeson, C. G.;Schaal, S. (1997). Learning tasks from a single demonstration, IEEE International Conference on Robotics and Automation (ICRA97), 2, pp.1706-1712, Piscataway, NJ: IEEE.
[Keywords: learning from demonstration, imitation, reinforcement learning]
[Detail] [BibTeX] [PDF]
Schaal, S. (1997). Learning from demonstration, in: Mozer, M. C.;Jordan, M.;Petsche, T. (eds.), Advances in Neural Information Processing Systems 9, pp.1040-1046, MIT Press.
[Keywords: imitation learning, movement primitives, reinforcement learning, shaping, priming]
[Detail] [BibTeX] [PDF]
Miyamoto, H.;Gandolfo, F.;Gomi, H.;Schaal, S.;Koike, Y.;Osu, R.;Nakano, E.;Kawato, M. (1995). A kendama learning robot based on a dynamic optimization theory, Preceedings of the 4th IEEE International Workshop on Robot and Human Communication (RO-MAN'95), pp.327-332.
[Keywords: learning from demonstration, internal models, via points, robot learning, imitation, biomimetic robotics, reinforcement learning]
[Detail] [BibTeX]