James Jensen
2014-01-30 19:23:28 UTC
I usually hesitate to suggest a new feature in a library like this
unless I am in a position to work on it myself. However, given the
number of people who seem eager to find something to contribute, and
given the recent discussion about improving the Gaussian process module,
I thought I'd venture an idea.
Bayesian optimization is an efficient method used especially for
functions that are expensive to evaluate. The basic idea is to fit the
function using Gaussian processes, using a surrogate function that
determines where to evaluate next in each iteration. The surrogate
strikes a balance between exploration (sampling intervals you haven't
tried before) and exploitation (if previous samples in a vicinity scored
well, then the likelihood of getting a high score in that area is high).
Some of the math behind it is beyond me, but the general idea is very
intuitive. Brochu, Cora, and de Freitas (2010) "A Tutorial on Bayesian
Optimization of Expensive Cost Functions," is a good introduction.
One useful application of Bayesian optimization is hyperparameter
tuning. It can be used to optimize the cross-validation score, as an
alternative to, for example, grid search. Grid search is simple and
parallelizable, there is no overhead in choosing the hyperparameters to
try, and the nature of some estimators allows them to be used with it
very efficiently. Bayesian optimization is serial and has a small amount
of overhead in evaluating the surrogate. But it is generally much more
efficient in finding good solutions, and particularly shines when the
scoring function is costly or when there are more than 1 or 2
hyperparameters to tune; here grid search is less attractive and
sometimes completely impractical.
In one of my own applications, involving 4 regularization parameters,
I've been using the BayesOpt library
(http://rmcantin.bitbucket.org/html/index.html), which offers it as a
general-purpose optimization technique that one can manually integrate
with one's cross-validation code. In general, it works quite well, but
there are some limitations to its design that can make its integration
inconvenient. Having this functionality directly integrated into
scikit-learn and specifically tailored to hyperparameter tuning would be
useful. I have been impressed with the ease of use of such convenience
classes as GridSearchCV, and dream of having a corresponding BayesOptCV,
etc.
As a general-use optimization method, Bayesian optimization would belong
elsewhere than in scikit-learn, e.g. in scipy.optimize. But specifically
as a method for hyperparameter tuning, it seems it would fit well in the
scope of scikit-learn, especially since I expect it would not be much
more than a layer or two of functionality on top of what scikit-learn's
GP module offers (or will offer once revised). And it would be of more
general utility than an additional estimator here or there.
I'm curious to hear what others think about the idea. Would this be a
good fit for scikit-learn? Do we have people with the interest,
expertise, and time to take this on at some point?
unless I am in a position to work on it myself. However, given the
number of people who seem eager to find something to contribute, and
given the recent discussion about improving the Gaussian process module,
I thought I'd venture an idea.
Bayesian optimization is an efficient method used especially for
functions that are expensive to evaluate. The basic idea is to fit the
function using Gaussian processes, using a surrogate function that
determines where to evaluate next in each iteration. The surrogate
strikes a balance between exploration (sampling intervals you haven't
tried before) and exploitation (if previous samples in a vicinity scored
well, then the likelihood of getting a high score in that area is high).
Some of the math behind it is beyond me, but the general idea is very
intuitive. Brochu, Cora, and de Freitas (2010) "A Tutorial on Bayesian
Optimization of Expensive Cost Functions," is a good introduction.
One useful application of Bayesian optimization is hyperparameter
tuning. It can be used to optimize the cross-validation score, as an
alternative to, for example, grid search. Grid search is simple and
parallelizable, there is no overhead in choosing the hyperparameters to
try, and the nature of some estimators allows them to be used with it
very efficiently. Bayesian optimization is serial and has a small amount
of overhead in evaluating the surrogate. But it is generally much more
efficient in finding good solutions, and particularly shines when the
scoring function is costly or when there are more than 1 or 2
hyperparameters to tune; here grid search is less attractive and
sometimes completely impractical.
In one of my own applications, involving 4 regularization parameters,
I've been using the BayesOpt library
(http://rmcantin.bitbucket.org/html/index.html), which offers it as a
general-purpose optimization technique that one can manually integrate
with one's cross-validation code. In general, it works quite well, but
there are some limitations to its design that can make its integration
inconvenient. Having this functionality directly integrated into
scikit-learn and specifically tailored to hyperparameter tuning would be
useful. I have been impressed with the ease of use of such convenience
classes as GridSearchCV, and dream of having a corresponding BayesOptCV,
etc.
As a general-use optimization method, Bayesian optimization would belong
elsewhere than in scikit-learn, e.g. in scipy.optimize. But specifically
as a method for hyperparameter tuning, it seems it would fit well in the
scope of scikit-learn, especially since I expect it would not be much
more than a layer or two of functionality on top of what scikit-learn's
GP module offers (or will offer once revised). And it would be of more
general utility than an additional estimator here or there.
I'm curious to hear what others think about the idea. Would this be a
good fit for scikit-learn? Do we have people with the interest,
expertise, and time to take this on at some point?