|
|
| Record Number | 873 |
| Reference Type | Conference Proceedings |
| Author(s) | Schaal, S. |
| Year | 1997 |
| Title | Learning from demonstration |
| Journal/Conference/Book Title | Advances in Neural Information Processing Systems 9 |
| Label | Scha97a |
| Keywords | imitation learning, movement primitives, reinforcement learning, shaping, priming |
Abstract | By now it is widely accepted that learning a task from scratch, i.e., without any prior knowledge, is a daunting undertaking. Humans, however, rarely attempt to learn from scratch. They extract initial biases as well as strategies how to approach a learning problem from instructions and/or demonstrations of other humans. For learning control, this paper investigates how learning from demonstration can be applied in the context of reinforcement learning. We consider priming the Q-function, the value function, the policy, and the model of the task dynamics as possible areas where demonstrations can speed up learning. In general nonlinear learning problems, only model-based reinforcement learning shows significant speed-up after a demonstration, while in the special case of linear quadratic regulator (LQR) problems, all methods profit from the demonstration. In an implementation of pole balancing on a complex anthropomorphic robot arm, we demonstrate that, when facing the complexities of real signal processing, model-based reinforcement learning offers the most robustness for LQR problems. Using the suggested methods, the robot learns pole balancing in just a single trial after a 30 second long demonstration of the human instructor.
|
| Notes | clmc |
| URL(s) | http://www-clmc.usc.edu/publications/S/schaal-NIPS1997.pdf
|
| Editor(s) | Mozer, M. C.;Jordan, M.;Petsche, T. |
| Place Published | Cambridge, MA |
| Publisher | MIT Press |
| Pages | 1040-1046 |
| Short Title | Learning from demonstration |
| Papers are available as Adobe PDF ".pdf" files. Adobe Reader is available for free for all computer platforms.
|
|
|
|
|
Page last modified on August 10, 2006, at 06:47 PM
|
|