Workshop for Women in Machine Learning, October 4, 2006
Towards Bayesian Black Box Learning Systems
Abstract:
A long-standing dream of machine learning is to create black box learning systems that can operate autonomously in home, research and industrial applications. While it is well understood that a universal black box may not be possible, significant progress can be made in specific domains. In particular, we address learning problems in sensor-rich and data-rich environments, as provided by autonomous vehicles, surveillance systems, biological or robotic systems. In these scenarios, the input data has hundreds or thousands of dimensions and is used to make predictions (often in real-time), resulting in a learning system that learns to "understand" the environment.
The goal of machine learning in this domain is to devise algorithms that can efficiently deal with very high dimensional data, usually contaminated by noise, redundancy and irrelevant dimensions. These algorithms must learn nonlinear functions, potentially in an incremental and real-time fashion, for robust classification and regression. In order to achieve black box quality, manual tuning parameters (e.g. as in gradient descent or structure selection) need to be minimized or, ideally, avoided.
Bayesian inference, when combined with approximation methods to reduce computational complexity, suggests a promising route to achieve our goals, since it offers a principled way to eliminate open parameters. In past work, we have started to create a toolbox of methods to achieve our goal of black box learning. In (Ting et al., NIPS 2005), we introduced a Bayesian approach to linear regression. The novelty of this algorithm comes from a Bayesian and EM-like formulation of linear regression that robustly performs automatic feature detection in the inputs in a computationally efficient way. We applied this algorithm to the analysis of neuroscientific data (i.e. the problem of prediction of electromyographic (EMG) activity in the arm muscles of a monkey from spiking activity of neurons in the primary motor and premotor cortex). The algorithm achieves results that are faster by orders of magnitude and higher quality than previously applied methods.
More recently, we introduced a variational Bayesian regression algorithm that is able to perform optimal prediction, given noise-contaminated input and output data (Ting, D’Souza & Schaal, ICML 2006). Traditional linear regression algorithms produce biased estimates when input noise is present and suffer numerically when the data contains irrelevant and/or redundant inputs. Our algorithm is able to effectively handle datasets with both characteristics. On a system identification task for a robot dynamics model, we achieved from 10 to 70% better results than traditional approaches.
Current work focuses on developing a Bayesian version of nonlinear function approximation with locally weighted regression. The challenge is to determine the size of the neighborhood of data that should contribute to the local regression model – a typical bias-variance trade-off problem. Preliminary results indicate that a full Bayesian treatment of this problem can achieve impressive robust function approximation performance without the need for tuning meta parameters. We are also interested in extending this locally linear Bayesian model to an online setting, in the spirit of dynamic Bayesian networks, to offer a parameter-free alternative to incremental learning.