James Jensen

2014-01-30 19:23:28 UTC

I usually hesitate to suggest a new feature in a library like this

unless I am in a position to work on it myself. However, given the

number of people who seem eager to find something to contribute, and

given the recent discussion about improving the Gaussian process module,

I thought I'd venture an idea.

Bayesian optimization is an efficient method used especially for

functions that are expensive to evaluate. The basic idea is to fit the

function using Gaussian processes, using a surrogate function that

determines where to evaluate next in each iteration. The surrogate

strikes a balance between exploration (sampling intervals you haven't

tried before) and exploitation (if previous samples in a vicinity scored

well, then the likelihood of getting a high score in that area is high).

Some of the math behind it is beyond me, but the general idea is very

intuitive. Brochu, Cora, and de Freitas (2010) "A Tutorial on Bayesian

Optimization of Expensive Cost Functions," is a good introduction.

One useful application of Bayesian optimization is hyperparameter

tuning. It can be used to optimize the cross-validation score, as an

alternative to, for example, grid search. Grid search is simple and

parallelizable, there is no overhead in choosing the hyperparameters to

try, and the nature of some estimators allows them to be used with it

very efficiently. Bayesian optimization is serial and has a small amount

of overhead in evaluating the surrogate. But it is generally much more

efficient in finding good solutions, and particularly shines when the

scoring function is costly or when there are more than 1 or 2

hyperparameters to tune; here grid search is less attractive and

sometimes completely impractical.

In one of my own applications, involving 4 regularization parameters,

I've been using the BayesOpt library

(http://rmcantin.bitbucket.org/html/index.html), which offers it as a

general-purpose optimization technique that one can manually integrate

with one's cross-validation code. In general, it works quite well, but

there are some limitations to its design that can make its integration

inconvenient. Having this functionality directly integrated into

scikit-learn and specifically tailored to hyperparameter tuning would be

useful. I have been impressed with the ease of use of such convenience

classes as GridSearchCV, and dream of having a corresponding BayesOptCV,

etc.

As a general-use optimization method, Bayesian optimization would belong

elsewhere than in scikit-learn, e.g. in scipy.optimize. But specifically

as a method for hyperparameter tuning, it seems it would fit well in the

scope of scikit-learn, especially since I expect it would not be much

more than a layer or two of functionality on top of what scikit-learn's

GP module offers (or will offer once revised). And it would be of more

general utility than an additional estimator here or there.

I'm curious to hear what others think about the idea. Would this be a

good fit for scikit-learn? Do we have people with the interest,

expertise, and time to take this on at some point?

unless I am in a position to work on it myself. However, given the

number of people who seem eager to find something to contribute, and

given the recent discussion about improving the Gaussian process module,

I thought I'd venture an idea.

Bayesian optimization is an efficient method used especially for

functions that are expensive to evaluate. The basic idea is to fit the

function using Gaussian processes, using a surrogate function that

determines where to evaluate next in each iteration. The surrogate

strikes a balance between exploration (sampling intervals you haven't

tried before) and exploitation (if previous samples in a vicinity scored

well, then the likelihood of getting a high score in that area is high).

Some of the math behind it is beyond me, but the general idea is very

intuitive. Brochu, Cora, and de Freitas (2010) "A Tutorial on Bayesian

Optimization of Expensive Cost Functions," is a good introduction.

One useful application of Bayesian optimization is hyperparameter

tuning. It can be used to optimize the cross-validation score, as an

alternative to, for example, grid search. Grid search is simple and

parallelizable, there is no overhead in choosing the hyperparameters to

try, and the nature of some estimators allows them to be used with it

very efficiently. Bayesian optimization is serial and has a small amount

of overhead in evaluating the surrogate. But it is generally much more

efficient in finding good solutions, and particularly shines when the

scoring function is costly or when there are more than 1 or 2

hyperparameters to tune; here grid search is less attractive and

sometimes completely impractical.

In one of my own applications, involving 4 regularization parameters,

I've been using the BayesOpt library

(http://rmcantin.bitbucket.org/html/index.html), which offers it as a

general-purpose optimization technique that one can manually integrate

with one's cross-validation code. In general, it works quite well, but

there are some limitations to its design that can make its integration

inconvenient. Having this functionality directly integrated into

scikit-learn and specifically tailored to hyperparameter tuning would be

useful. I have been impressed with the ease of use of such convenience

classes as GridSearchCV, and dream of having a corresponding BayesOptCV,

etc.

As a general-use optimization method, Bayesian optimization would belong

elsewhere than in scikit-learn, e.g. in scipy.optimize. But specifically

as a method for hyperparameter tuning, it seems it would fit well in the

scope of scikit-learn, especially since I expect it would not be much

more than a layer or two of functionality on top of what scikit-learn's

GP module offers (or will offer once revised). And it would be of more

general utility than an additional estimator here or there.

I'm curious to hear what others think about the idea. Would this be a

good fit for scikit-learn? Do we have people with the interest,

expertise, and time to take this on at some point?