Site Search   view print login

Research » Reinforcement Learning

Reinforcement Learning for Robotics and Computational Motor Control

While supervised statistical learning techniques have significant applications for model and imitation learning, they do not suffice for all motor learning problems, particularly when no expert teacher or idealized desired behavior is available. Thus, both robotics and the understanding of human motor control require reward (or cost) related self-improvement. The developement of efficient reinforcement learning methods is therefore essential for the success of learning in motor control.

However, reinforcement learning in high-dimensional spaces such as manipulator and humanoid robotics is extremely difficult as a complete exploration of the underlying state-action spaces is impossible and few existing techniques scale into this domain.

Nevertheless, it is obvious that humans also never need such an extensive exploration in order to learn new motor skills and instead rely upon a combination of both watching a teacher and subsequent self-improvement. In more technical terms: first, a control policy is obtained by imitation and then improved using reinforcement learning. It is essential that only local policy search techniques, e.g., policy gradient methods, are applied as a rapid change to the policy would result into a complete unlearning of the policy and might also result into an unstable control policies which can damage the robot.

In order to bring reinforcement learning to robotics and computational motor control, we have developed a variety of novel reinforcement learning algorithms, such as the Natural Actor-Critic and the Episodic Natural Actor-Critic. These methods are particularly well-suited for policies based upon motor primitives and are being applied to motor skill learning in humanoid robotics and legged locomotion.

Contact persons: Mrinal Kalakrishnan, Jan Peters, Stefan Schaal

Related Publications

Peters, J.;Schaal, S. (in press). Reinforcement learning of motor skills with policy gradients, Neural Networks.
[Keywords: reinforcement learning, policy gradient methods, natural gradients, natural actor-critic, motor skills, motor primitives]
[Detail] [BibTeX] [PDF]

Peters, J., Schaal, S. (2008). Learning to Control in Operational Space, The International Journal of Robotics Research, 27, 2, pp.197-212.
[Keywords: operational space control, robot learning, reinforcement learning, reward-weighted regression]
[Detail] [BibTeX] [PDF]

Peters, J.; Schaal, S. (2008). Natural Actor-Critic, Neurocomputing, 71, 7-9, pp.1180-1190.
[Keywords: reinforcement learning, policy gradient, natural actor-critic, natural gradients]
[Detail] [BibTeX] [PDF]

Hoffmann, H.;Theodorou, E.;Schaal, S. (2008). Behavioral experiments on reinforcement learning in human motor control, Abstracts of the Eighteenth Annual Meeting of Neural Control of Movement (NCM).
[Keywords: computational motor control, optimal control, reinforcement learning]
[Detail] [BibTeX]

Peters, J., Schaal, S. (2007). Policy Learning for Motor Skills, Proceedings of 14th International Conference on Neural Information Processing (ICONIP).
[Keywords: machine learning, reinforcement learning, robotics, motor primitives, policy gradients, natural actor-critic, reward-weighted regression]
[Detail] [BibTeX]

Theodorou, E; Peters, J; Schaal, S. (2007). Reinforcement Learning for Optimal Control of Arm Movements, Abstracts of the 37st Meeting of the Society of Neuroscience..
[Keywords: optimal control,reinforcement learning, arm movements]
[Detail] [BibTeX]

Peters, J. (2007). Machine Learning of Motor Skills for Robotics, Ph.D. Thesis, Department of Computer Science, University of Southern California.
[Keywords: machine learning, reinforcement learning, robotics, motor primitives, policy gradients, natural actor-critic, reward-weighted regression]
[Detail] [BibTeX]

Peters, J.;Schaal, S. (2007). Reinforcement learning for operational space control, International Conference on Robotics and Automation (ICRA2007), pp.2111-2116.
[Keywords: operational space control, reinforcement learning, weighted regression, em-algorithm]
[Detail] [BibTeX] [PDF]

Peters, J.;Schaal, S. (2007). Using reward-weighted regression for reinforcement learning of task space control, Proceedings of the 2007 IEEE Internatinal Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[Keywords: reinforcement learning, cart-pole, policy gradient methods]
[Detail] [BibTeX] [PDF]

Peters, J.;Schaal, S. (2007). Applying the episodic natural actor-critic architecture to motor primitive learning, Proceedings of the 2007 European Symposium on Artificial Neural Networks (ESANN).
[Keywords: reinforcement learning, policy gradient methods, motor primitives, natural actor-critic]
[Detail] [BibTeX] [PDF]

Peters, J.;Schaal, S. (2007). Reinforcement learning by reward-weighted regression for operational space control, Proceedings of the International Conference on Machine Learning (ICML2007).
[Keywords: reinforcement learning, operational space control, weighted regression]
[Detail] [BibTeX] [PDF]

Peters, J.;Theodorou, E.;Schaal, S. (2007). Policy gradient methods for machine learning, INFORMS Conference of the Applied Probability Society.
[Keywords: policy gradient methods, reinforcement learning, simulation-optimization]
[Detail] [BibTeX] [PDF]

Riedmiller, M.;Peters, J.;Schaal, S. (2007). Evaluation of policy gradient methods and variants on the cart-pole benchmark, Proceedings of the 2007 IEEE Internatinal Symposium on Approximate Dynamic Programming and Reinforcement Learning.
[Keywords: reinforcement learning, cart-pole, policy gradient methods]
[Detail] [BibTeX] [PDF]

Peters, J.;Schaal, S.;Schšlkopf, B. (2007). Towards machine learning for motor skills, FachgespŠche Atonome Mobile Systeme (AMS 2007), pp.138-144, Springer.
[Keywords: reinforcement learning, autonomous robotics]
[Detail] [BibTeX] [PDF]

Peters, J.; Schaal, S. (2006). Policy Gradient Methods for Robotics, Proceedings of the IEEE International Conference on Intelligent Robotics Systems (IROS).
[Keywords: policy gradient methods, reinforcement learning, robotics]
[Detail] [BibTeX] [PDF]

Peters, J.;Schaal, S. (2006). Reinforcement Learning for Parameterized Motor Primitives, Proceedings of the 2006 International Joint Conference on Neural Networks (IJCNN 2006).
[Keywords: motor primitives, reinforcement learning]
[Detail] [BibTeX] [PDF]

Peters, J.;Schaal, S. (2006). Policy gradient methods for robotics, Proceedings of the IEEE International Conference on Intelligent Robotics Systems (IROS 2006).
[Keywords: policy gradient methods, reinforcement learning, robotics]
[Detail] [BibTeX] [PDF]

Peters, J.;Vijayakumar, S.;Schaal, S. (2005). Natural Actor-Critic, in: Gama, J.;Camacho, R.;Brazdil, P.;Jorge, A.;Torgo, L. (eds.), Proceedings of the 16th European Conference on Machine Learning (ECML 2005), 3720, pp.280-291, Springer.
[Keywords: reinforcement learning, policy gradients, natural gradients]
[Detail] [BibTeX] [PDF]

Peters, J.;Vijayakumar, S.;Schaal, S. (2003). Reinforcement learning for humanoid robotics, Humanoids2003, Third IEEE-RAS International Conference on Humanoid Robots.
[Keywords: reinforcement learning, policy gradients, movement primitives, behaviors, dynamic systems, humanoid robotics]
[Detail] [BibTeX] [PDF]

Peters, J.;Vijayakumar, S.;Schaal, S. (2003). Scaling reinforcement learning paradigms for motor learning, Proceedings of the 10th Joint Symposium on Neural Computation (JSNC 2003).
[Keywords: reinforcement learning, neurodynamic programming, actorcritic methods, policy gradient methods, natural policy gradient]
[Detail] [BibTeX] [PDF]

Schaal, S. (2002). Learning robot control, in: Arbib, M. A. (eds.), The handbook of brain theory and neural networks, 2nd Edition, pp.983-987, MIT Press.
[Keywords: robot learning, review, reinforcement learning, supervised learning, real-time learning]
[Detail] [BibTeX] [PDF]

Atkeson, C. G.;Moore, A. W.;Schaal, S. (1997). Locally weighted learning for control, Artificial Intelligence Review, 11, 1-5, pp.75-113.
[Keywords: statistical learning, nonparametric regression, distance metric, lazy learning, learning control, reinforcement learning]
[Detail] [BibTeX] [PDF]

Atkeson, C. G.;Schaal, S. (1997). Robot learning from demonstration, in: Fisher Jr., D. H. (eds.), Machine Learning: Proceedings of the Fourteenth International Conference (ICML '97), pp.12-20, Morgan Kaufmann.
[Keywords: imitation learning, reinforcement learning, dynamic programming, motor skills]
[Detail] [BibTeX] [PDF]

Atkeson, C. G.;Schaal, S. (1997). Learning tasks from a single demonstration, IEEE International Conference on Robotics and Automation (ICRA97), 2, pp.1706-1712, Piscataway, NJ: IEEE.
[Keywords: learning from demonstration, imitation, reinforcement learning]
[Detail] [BibTeX] [PDF]

Schaal, S. (1997). Learning from demonstration, in: Mozer, M. C.;Jordan, M.;Petsche, T. (eds.), Advances in Neural Information Processing Systems 9, pp.1040-1046, MIT Press.
[Keywords: imitation learning, movement primitives, reinforcement learning, shaping, priming]
[Detail] [BibTeX] [PDF]

Miyamoto, H.;Gandolfo, F.;Gomi, H.;Schaal, S.;Koike, Y.;Osu, R.;Nakano, E.;Kawato, M. (1995). A kendama learning robot based on a dynamic optimization theory, Preceedings of the 4th IEEE International Workshop on Robot and Human Communication (RO-MAN'95), pp.327-332.
[Keywords: learning from demonstration, internal models, via points, robot learning, imitation, biomimetic robotics, reinforcement learning]
[Detail] [BibTeX]

Page last modified on January 24, 2008, at 08:42 AM