Research » Reinforcement Learning

Reinforcement learning is from my perspective the automatic design of approximately optimal controllers from measurements. In traditional (optimal) control, the smart human in the loop decides how to measure and model the system. In RL, on the other hand, the optimal controller is constructed by the RL system directly from measurements; however, the way to the optimal controller can require extensive prestructuring through structured policies, value functions or models. In this page, I want to list some of the projects I am working on or have worked on but this list will always be fairly incomplete.

Reinforcement Learning for Computational Motor Control and Robotics

My general goal in reinforcement learning is the development of methods which scale into the dimensionality of humanoid robots and can generate actions for seven or more degrees of freedom, e.g., for a human arm. Such problems are a tremendous challenge for reinforcement learning as they require a state space of 21 or more dimensions (one dimension for each joint position, velocity and acceleration) and an action space of seven dimensions.

While supervised statistical learning techniques have significant applications for model and imitation learning, they do not suffice for all motor learning problems, particularly when no expert teacher or idealized desired behavior is available. Thus, both robotics and the understanding of human motor control require reward (or cost) related self-improvement. The developement of efficient reinforcement learning methods is therefore essential for the success of learning in motor control.

However, reinforcement learning in high-dimensional spaces such as manipulator and humanoid robotics is extremely difficult as a complete exploration of the underlying state-action spaces is impossible and few existing techniques scale into this domain.

Nevertheless, it is obvious that humans also never need such an extensive exploration in order to learn new motor skills and instead rely upon a combination of both watching a teacher and subsequent self-improvement. In more technical terms: first, a control policy is obtained by imitation and then improved using reinforcement learning. It is essential that only local policy search techniques, e.g., policy gradient methods, are applied as a rapid change to the policy would result into a complete unlearning of the policy and might also result into an unstable control policies which can damage the robot.

New Policy Learning Methods

In order to bring reinforcement learning to robotics and computational motor control, we have both improved existing reinforcement learning methods as well as developed a variety of novel algorithms. At this point, we can only give a short overview of these methods:

  • Policy Gradient Methods: One class of methods which are particularly interesting, are policy gradient methods duer to their stronger guarantees. A nice tutorial to get started can be found in the Policy Gradient Toolbox which I created for an upcoming survey.
  • Natural Actor-Critic: The natural actor-critic makes use of the fact, that a natural gradient usually beats a vanilla gradient. We have developed several versions and have realized that algorithms such as Sutton's Actor-Critic and Bradtke & Bartos' Q-Learning for the traditional problem of Linear Quadratic-Regulation can be derived from this setting.
  • EM-like Reinforcement Learning: If we had a teacher labeling all actions as good or bad in a binary fashion, we would have an imitation learning problem. However, if we consider these labels as hidden variables and use the returns/action values as improper distributions over the labels, we obtain an inference problem. This problem has led to the reward-weighted regression and the PoWER algorithm.

Related Publications

Record Number10271
Reference TypeJournal Article
Author(s)Peters, J.;Schaal, S.
Year2008
TitleReinforcement learning of motor skills with policy gradients
Journal/Conference/Book TitleNeural Networks
KeywordsReinforcement learning, Policy gradient methods, Natural gradients, Natural Actor-Critic, Motor skills, Motor primitives
AbstractAutonomous learning is one of the hallmarks of human and animal behavior, and understanding the principles of learning will be crucial in order to achieve true autonomy in advanced machines like humanoid robots. In this paper, we examine learning of complex motor skills with human-like limbs. While supervised learning can offer useful tools for bootstrapping behavior, e.g., by learning from demonstration, it is only reinforcement learning that offers a general approach to the final trial-and-error improvement that is needed by each individual acquiring a skill. Neither neurobiological nor machine learning studies have, so far, offered compelling results on how reinforcement learning can be scaled to the high-dimensional continuous state and action spaces of humans or humanoids. Here, we combine two recent research developments on learning motor control in order to achieve this scaling. First, we interpret the idea of modular motor control by means of motor primitives as a suitable way to generate parameterized control policies for reinforcement learning. Second, we combine motor primitives with the theory of stochastic policy gradient learning, which currently seems to be the only feasible framework for reinforcement learning for humanoids. We evaluate different policy gradient methods with a focus on their applicability to parameterized motor primitives. We compare these algorithms in the context of motor primitive learning, and show that our most modern algorithm, the Episodic Natural Actor-Critic outperforms previous algorithms by at least an order of magnitude. We demonstrate the efficiency of this reinforcement learning method in the application of learning to hit a baseball with an anthropomorphic robot arm.
Notesclmc Journal Article United States the official journal of the International Neural Network Society
Volume21
Number4
Pages682-97
DateMay
Short TitleReinforcement learning of motor skills with policy gradients
ISBN/ISSN0893-6080 (Print)
Accession Number18482830
URL(s) http://www-clmc.usc.edu/publications/P/peters-NN2008.pdf
AddressMax Planck Institute for Biological Cybernetics, Spemannstr. 38, 72076 Tubingen, Germany; University of Southern California, 3710 S. McClintoch Ave-RTH401, Los Angeles, CA 90089-2905, USA.
Languageeng


Page last modified on July 17, 2008, at 03:31 AM
Designed by: N.Ohanyan & J.Peters. Powered by PmWiki.