[Scikit-learn-general] Bayesian optimization for hyperparameter tuning

Discussion:

James Jensen

2014-01-30 19:23:28 UTC

I usually hesitate to suggest a new feature in a library like this
unless I am in a position to work on it myself. However, given the
number of people who seem eager to find something to contribute, and
given the recent discussion about improving the Gaussian process module,
I thought I'd venture an idea.

Bayesian optimization is an efficient method used especially for
functions that are expensive to evaluate. The basic idea is to fit the
function using Gaussian processes, using a surrogate function that
determines where to evaluate next in each iteration. The surrogate
strikes a balance between exploration (sampling intervals you haven't
tried before) and exploitation (if previous samples in a vicinity scored
well, then the likelihood of getting a high score in that area is high).
Some of the math behind it is beyond me, but the general idea is very
intuitive. Brochu, Cora, and de Freitas (2010) "A Tutorial on Bayesian
Optimization of Expensive Cost Functions," is a good introduction.

One useful application of Bayesian optimization is hyperparameter
tuning. It can be used to optimize the cross-validation score, as an
alternative to, for example, grid search. Grid search is simple and
parallelizable, there is no overhead in choosing the hyperparameters to
try, and the nature of some estimators allows them to be used with it
very efficiently. Bayesian optimization is serial and has a small amount
of overhead in evaluating the surrogate. But it is generally much more
efficient in finding good solutions, and particularly shines when the
scoring function is costly or when there are more than 1 or 2
hyperparameters to tune; here grid search is less attractive and
sometimes completely impractical.

In one of my own applications, involving 4 regularization parameters,
I've been using the BayesOpt library
(http://rmcantin.bitbucket.org/html/index.html), which offers it as a
general-purpose optimization technique that one can manually integrate
with one's cross-validation code. In general, it works quite well, but
there are some limitations to its design that can make its integration
inconvenient. Having this functionality directly integrated into
scikit-learn and specifically tailored to hyperparameter tuning would be
useful. I have been impressed with the ease of use of such convenience
classes as GridSearchCV, and dream of having a corresponding BayesOptCV,
etc.

As a general-use optimization method, Bayesian optimization would belong
elsewhere than in scikit-learn, e.g. in scipy.optimize. But specifically
as a method for hyperparameter tuning, it seems it would fit well in the
scope of scikit-learn, especially since I expect it would not be much
more than a layer or two of functionality on top of what scikit-learn's
GP module offers (or will offer once revised). And it would be of more
general utility than an additional estimator here or there.

I'm curious to hear what others think about the idea. Would this be a
good fit for scikit-learn? Do we have people with the interest,
expertise, and time to take this on at some point?

Dan Haiduc

2014-01-30 20:03:19 UTC

Permalink

Actually, I wanted to create exactly this myself.
I was then discouraged by the fact that Scikit-learn did not pull from a
guy who implemented Multi-Armed
Bandit<https://github.com/scikit-learn/scikit-learn/pull/906>on the
reason that Scikit-learn doesn't do reinforcement learning.
I'm new here (everywhere, not just scikit), and I'm not sure how closely
related MAB is with Bayesian optimization, but I think something along
those lines should definitely be implemented for hyperparameters, since
they're expensive functions almost by definition.

Great idea! I certainly wish it gets implemented as well.

Post by James Jensen
I usually hesitate to suggest a new feature in a library like this
unless I am in a position to work on it myself. However, given the
number of people who seem eager to find something to contribute, and
given the recent discussion about improving the Gaussian process module,
I thought I'd venture an idea.
Bayesian optimization is an efficient method used especially for
functions that are expensive to evaluate. The basic idea is to fit the
function using Gaussian processes, using a surrogate function that
determines where to evaluate next in each iteration. The surrogate
strikes a balance between exploration (sampling intervals you haven't
tried before) and exploitation (if previous samples in a vicinity scored
well, then the likelihood of getting a high score in that area is high).
Some of the math behind it is beyond me, but the general idea is very
intuitive. Brochu, Cora, and de Freitas (2010) "A Tutorial on Bayesian
Optimization of Expensive Cost Functions," is a good introduction.
One useful application of Bayesian optimization is hyperparameter
tuning. It can be used to optimize the cross-validation score, as an
alternative to, for example, grid search. Grid search is simple and
parallelizable, there is no overhead in choosing the hyperparameters to
try, and the nature of some estimators allows them to be used with it
very efficiently. Bayesian optimization is serial and has a small amount
of overhead in evaluating the surrogate. But it is generally much more
efficient in finding good solutions, and particularly shines when the
scoring function is costly or when there are more than 1 or 2
hyperparameters to tune; here grid search is less attractive and
sometimes completely impractical.
In one of my own applications, involving 4 regularization parameters,
I've been using the BayesOpt library
(http://rmcantin.bitbucket.org/html/index.html), which offers it as a
general-purpose optimization technique that one can manually integrate
with one's cross-validation code. In general, it works quite well, but
there are some limitations to its design that can make its integration
inconvenient. Having this functionality directly integrated into
scikit-learn and specifically tailored to hyperparameter tuning would be
useful. I have been impressed with the ease of use of such convenience
classes as GridSearchCV, and dream of having a corresponding BayesOptCV,
etc.
As a general-use optimization method, Bayesian optimization would belong
elsewhere than in scikit-learn, e.g. in scipy.optimize. But specifically
as a method for hyperparameter tuning, it seems it would fit well in the
scope of scikit-learn, especially since I expect it would not be much
more than a layer or two of functionality on top of what scikit-learn's
GP module offers (or will offer once revised). And it would be of more
general utility than an additional estimator here or there.
I'm curious to hear what others think about the idea. Would this be a
good fit for scikit-learn? Do we have people with the interest,
expertise, and time to take this on at some point?
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Hadayat Seddiqi

2014-01-30 20:11:00 UTC

Permalink

Hi,

So I was the one who volunteered to do contribute my GP code for a revamp
of scikits module. I'm far from an expert, and I can't say I understand how
this would fit off the top of my head, but if someone is knowledgeable and
willing to work on this then I'd be more than happy to lend a hand as well.
I've been kind of quiet on my own GP code so far.. just trying to get
everything as ready and nice as I can before bugging people again.

James you mentioned that you might be hesitant to suggest things if you
don't have time to implement. If I read that correctly, you're saying you
might not have the time, but in case you do, feel free to contact (this
goes for anyone, of course).

-Had

Post by Dan Haiduc
Actually, I wanted to create exactly this myself.
I was then discouraged by the fact that Scikit-learn did not pull from a
guy who implemented Multi-Armed Bandit<https://github.com/scikit-learn/scikit-learn/pull/906>on the reason that Scikit-learn doesn't do reinforcement learning.
I'm new here (everywhere, not just scikit), and I'm not sure how closely
related MAB is with Bayesian optimization, but I think something along
those lines should definitely be implemented for hyperparameters, since
they're expensive functions almost by definition.
Great idea! I certainly wish it gets implemented as well.

------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Zach Dwiel

2014-01-30 20:15:56 UTC

Permalink

It seems that with GridSearchCV and RandomizedSearchCV both already
included in scikit-learn, it would make sense to also include other common,
more efficient hyperparameter searchers as well.

zach

Post by Hadayat Seddiqi
Hi,
So I was the one who volunteered to do contribute my GP code for a revamp
of scikits module. I'm far from an expert, and I can't say I understand how
this would fit off the top of my head, but if someone is knowledgeable and
willing to work on this then I'd be more than happy to lend a hand as well.
I've been kind of quiet on my own GP code so far.. just trying to get
everything as ready and nice as I can before bugging people again.
James you mentioned that you might be hesitant to suggest things if you
don't have time to implement. If I read that correctly, you're saying you
might not have the time, but in case you do, feel free to contact (this
goes for anyone, of course).
-Had

Sturla Molden

2014-01-30 22:21:32 UTC

Permalink

As I understand it fro reading about this a LONG time ago (apologies if my
memory is rusty), "Bayesian optimization" means maximizing the
log-likelihood using the Newton-Raphson method. The word "Bayesian" comes
from an obfuscated explanation of what really happens: If we assume a flat
or Gaussian prior and approximate the log-likelihood with a second order
Taylor series expansion, the posterior is approximated with a Gaussian
dustribution. We can then improve this iteratively by refitting the
polynomial around the mode. But only statisticans like to explain
optimization with Newton-Raphson so difficulty. There is no need to involve
Gaussian approximations to the Bayesian posterior here. "Bayesian
optimization" is merely a buzzword. This is no more "Bayesian" than ML
using Fisher's scoring method, in fact it is identical. Any by the way,
Newton-Raphson is not about striking balances between exploitation and
exploration. That is also bullshitting. It is about quadratic convergence,
and if anything, it is famous for finding local optima and sometimes just
failing to converge by overshooting the target (which is why quasi-Newton
is often preferred).

:)

Sturla

Post by Zach Dwiel
It seems that with GridSearchCV and RandomizedSearchCV both already
included in scikit-learn, it would make sense to also include other
common, more efficient hyperparameter searchers as well.
zach
On Thu, Jan 30, 2014 at 3:11 PM, Hadayat Seddiqi
Hi,
So I was the one who volunteered to do contribute my GP code for a revamp
of scikits module. I'm far from an expert, and I can't say I understand
how this would fit off the top of my head, but if someone is
knowledgeable and willing to work on this then I'd be more than happy to
lend a hand as well. I've been kind of quiet on my own GP code so far..
just trying to get everything as ready and nice as I can before bugging people again.
James you mentioned that you might be hesitant to suggest things if you
don't have time to implement. If I read that correctly, you're saying you
might not have the time, but in case you do, feel free to contact (this
goes for anyone, of course).
-Had
On Thu, Jan 30, 2014 at 3:03 PM, Dan Haiduc
Actually, I wanted to create exactly this myself. I was then discouraged
by the fact that Scikit-learn did not pull from a guy who implemented
Multi-Armed Bandit
<a href="https://github.com/scikit-learn/scikit-learn/pull/906">https://github.com/scikit-learn/scikit-learn/pull/906</a>>on
the reason that Scikit-learn doesn't do reinforcement learning. I'm new
here (everywhere, not just scikit), and I'm not sure how closely related
MAB is with Bayesian optimization, but I think something along those
lines should definitely be implemented for hyperparameters, since they're
expensive functions almost by definition.
Great idea! I certainly wish it gets implemented as well.
On Thu, Jan 30, 2014 at 9:23 PM, James Jensen
I usually hesitate to suggest a new feature in a library like this unless
I am in a position to work on it myself. However, given the number of
people who seem eager to find something to contribute, and given the
recent discussion about improving the Gaussian process module, I thought
I'd venture an idea.
Bayesian optimization is an efficient method used especially for
functions that are expensive to evaluate. The basic idea is to fit the
function using Gaussian processes, using a surrogate function that
determines where to evaluate next in each iteration. The surrogate
strikes a balance between exploration (sampling intervals you haven't
tried before) and exploitation (if previous samples in a vicinity scored
well, then the likelihood of getting a high score in that area is high).
Some of the math behind it is beyond me, but the general idea is very
intuitive. Brochu, Cora, and de Freitas (2010) "A Tutorial on Bayesian
Optimization of Expensive Cost Functions," is a good introduction.
One useful application of Bayesian optimization is hyperparameter tuning.
It can be used to optimize the cross-validation score, as an alternative
to, for example, grid search. Grid search is simple and parallelizable,
there is no overhead in choosing the hyperparameters to try, and the
nature of some estimators allows them to be used with it very
efficiently. Bayesian optimization is serial and has a small amount of
overhead in evaluating the surrogate. But it is generally much more
efficient in finding good solutions, and particularly shines when the
scoring function is costly or when there are more than 1 or 2
hyperparameters to tune; here grid search is less attractive and
sometimes completely impractical.
In one of my own applications, involving 4 regularization parameters,
I've been using the BayesOpt library
(http://rmcantin.bitbucket.org/html/index.html), which offers it as a
general-purpose optimization technique that one can manually integrate
with one's cross-validation code. In general, it works quite well, but
there are some limitations to its design that can make its integration
inconvenient. Having this functionality directly integrated into
scikit-learn and specifically tailored to hyperparameter tuning would be
useful. I have been impressed with the ease of use of such convenience
classes as GridSearchCV, and dream of having a corresponding BayesOptCV, etc.
As a general-use optimization method, Bayesian optimization would belong
elsewhere than in scikit-learn, e.g. in scipy.optimize. But specifically
as a method for hyperparameter tuning, it seems it would fit well in the
scope of scikit-learn, especially since I expect it would not be much
more than a layer or two of functionality on top of what scikit-learn's
GP module offers (or will offer once revised). And it would be of more
general utility than an additional estimator here or there.
I'm curious to hear what others think about the idea. Would this be a
good fit for scikit-learn? Do we have people with the interest,
expertise, and time to take this on at some point?
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import a
virtual appliance and go from zero to informed in seconds.
<a
href="http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk">http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk</a>
_______________________________________________ Scikit-learn-general
href="https://lists.sourceforge.net/lists/listinfo/scikit-learn-general">https://lists.sourceforge.net/lists/listinfo/scikit-learn-general</a>
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import a
virtual appliance and go from zero to informed in seconds.
<a
href="http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk">http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk</a>
_______________________________________________ Scikit-learn-general
href="https://lists.sourceforge.net/lists/listinfo/scikit-learn-general">https://lists.sourceforge.net/lists/listinfo/scikit-learn-general</a>
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import a
virtual appliance and go from zero to informed in seconds.
<a
href="http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk">http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk</a>
_______________________________________________ Scikit-learn-general
href="https://lists.sourceforge.net/lists/listinfo/scikit-learn-general">https://lists.sourceforge.net/lists/listinfo/scikit-learn-general</a>
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import a
virtual appliance and go from zero to informed in seconds. <a
href="http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk">http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk</a>

Ken Arnold

2014-01-31 03:22:21 UTC

Permalink

Post by Sturla Molden
As I understand it fro reading about this a LONG time ago (apologies if my
memory is rusty), "Bayesian optimization" means maximizing the
log-likelihood using the Newton-Raphson method.

Probably that was how the term was typically used at one time, but recently
"Bayesian optimization" has come to mean something different. In a setting
where the function to be optimized is expensive to evaluate (e.g., the
error of an estimator as a function of its hyperparameters), and especially
if samples of that function's value are noisy, it can be helpful to
estimate values of that function as the posterior of a Gaussian Process
prior and a Gaussian observation likelihood. Given that function estimate
(as predicted mean and variance), you can globally optimize an "expected
improvement" heuristic to find the best point(s) to request function
evaluations next.

For details, see:

Snoek, J., Larochelle, H., & Adams, R. P. (2012). Practical Bayesian
Optimization of Machine Learning Algorithms. *arXiv preprint
arXiv:1206.2944*, 1â9. Machine Learning; Learning. Retrieved from
http://arxiv.org/abs/1206.2944
or, as Patrick linked to, http://www.cs.toronto.edu/~jasper/bayesopt.pdf

-Ken

James Jensen

2014-01-30 21:25:18 UTC

Permalink

This post might be inappropriate. Click to display it.

Gael Varoquaux

2014-01-30 22:28:17 UTC

Permalink

Post by James Jensen
Bayesian optimization is an efficient method used especially for
functions that are expensive to evaluate. The basic idea is to fit the
function using Gaussian processes, using a surrogate function that
determines where to evaluate next in each iteration. The surrogate
strikes a balance between exploration (sampling intervals you haven't
tried before) and exploitation (if previous samples in a vicinity scored
well, then the likelihood of getting a high score in that area is high).
Some of the math behind it is beyond me, but the general idea is very
intuitive. Brochu, Cora, and de Freitas (2010) "A Tutorial on Bayesian
Optimization of Expensive Cost Functions," is a good introduction.
One useful application of Bayesian optimization is hyperparameter
tuning.

Thanks a lot for your enthousiasme and suggestion.

Indeed, many of the core developpers would love to see simple Bayesian
optimization used for hyperparameter optimization, for instance taking
the gist of hyperopt https://github.com/hyperopt/hyperopt and making an
extended version of the RandomSearchCV.

However there are a number of technical roadblocks to get there. In
particular the Gaussian process could be improved (to implement
partial_fit for online learning), and the parallel computing engine
(joblib) does not support well as producer/consumer pattern. None of
these problems are showstoppers, but they reduce the usefulness of a
hyper-parameter selection object using Bayesian optimization.

I would hope that we find time to implement these difficult core aspects
and eventually get to implementing a more advanced hyper-parameter
optimizer. But all the core developers are very busy and spending a lot
of time simply maintaining the library (have a look at the number of
issues open or pull requests that are waiting to be reviewed to have an
idea).

If you want to help -beyond helping with reviewing/finishing pull
requests and closing issues, I suggest that first, to prototype code, you
could first submit an example using the Gaussian processes to do
optimization of a noisy function. In a second step, after having that
example merged, we could think about how to build a BayesianSearchCV
object.

Cheers,

Gaël

Frédéric Bastien

2014-01-31 00:53:16 UTC

Permalink

I have a question on those type of algo for hyper parameter
optimization. With a grid search, we can run all jobs in parallel. But
I have the impression that those algo remove that possibility. Is
there there way to sample many starting configuration with those algo?
But the most interresting question, if we start many jobs in parallel,
if the jobs don't finish at the same time as this happen frequently,
can we sample new test point while maximizing the "coverage" with the
currently running jobs that don't have results?

Fred

On Thu, Jan 30, 2014 at 5:28 PM, Gael Varoquaux

Post by Gael Varoquaux

Thanks a lot for your enthousiasme and suggestion.
Indeed, many of the core developpers would love to see simple Bayesian
optimization used for hyperparameter optimization, for instance taking
the gist of hyperopt https://github.com/hyperopt/hyperopt and making an
extended version of the RandomSearchCV.
However there are a number of technical roadblocks to get there. In
particular the Gaussian process could be improved (to implement
partial_fit for online learning), and the parallel computing engine
(joblib) does not support well as producer/consumer pattern. None of
these problems are showstoppers, but they reduce the usefulness of a
hyper-parameter selection object using Bayesian optimization.
I would hope that we find time to implement these difficult core aspects
and eventually get to implementing a more advanced hyper-parameter
optimizer. But all the core developers are very busy and spending a lot
of time simply maintaining the library (have a look at the number of
issues open or pull requests that are waiting to be reviewed to have an
idea).
If you want to help -beyond helping with reviewing/finishing pull
requests and closing issues, I suggest that first, to prototype code, you
could first submit an example using the Gaussian processes to do
optimization of a noisy function. In a second step, after having that
example merged, we could think about how to build a BayesianSearchCV
object.
Cheers,
Gaël
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Patrick Mineault

2014-01-31 01:28:38 UTC

Permalink

Sure you can:

http://www.cs.toronto.edu/~jasper/*bayes*opt.pdf

And some python code:

https://github.com/JasperSnoek/spearmint

Post by FrÃ©dÃ©ric Bastien
I have a question on those type of algo for hyper parameter
optimization. With a grid search, we can run all jobs in parallel. But
I have the impression that those algo remove that possibility. Is
there there way to sample many starting configuration with those algo?
But the most interresting question, if we start many jobs in parallel,
if the jobs don't finish at the same time as this happen frequently,
can we sample new test point while maximizing the "coverage" with the
currently running jobs that don't have results?
Fred
On Thu, Jan 30, 2014 at 5:28 PM, Gael Varoquaux

Post by Gael Varoquaux

Thanks a lot for your enthousiasme and suggestion.
Indeed, many of the core developpers would love to see simple Bayesian
optimization used for hyperparameter optimization, for instance taking
the gist of hyperopt https://github.com/hyperopt/hyperopt and making an
extended version of the RandomSearchCV.
However there are a number of technical roadblocks to get there. In
particular the Gaussian process could be improved (to implement
partial_fit for online learning), and the parallel computing engine
(joblib) does not support well as producer/consumer pattern. None of
these problems are showstoppers, but they reduce the usefulness of a
hyper-parameter selection object using Bayesian optimization.
I would hope that we find time to implement these difficult core aspects
and eventually get to implementing a more advanced hyper-parameter
optimizer. But all the core developers are very busy and spending a lot
of time simply maintaining the library (have a look at the number of
issues open or pull requests that are waiting to be reviewed to have an
idea).
If you want to help -beyond helping with reviewing/finishing pull
requests and closing issues, I suggest that first, to prototype code, you
could first submit an example using the Gaussian processes to do
optimization of a noisy function. In a second step, after having that
example merged, we could think about how to build a BayesianSearchCV
object.
Cheers,
Gaël

------------------------------------------------------------------------------

Post by Gael Varoquaux
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.

http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk

Post by Gael Varoquaux
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Frédéric Bastien

2014-01-31 19:22:03 UTC

Permalink

thanks.

Fred

On Thu, Jan 30, 2014 at 8:28 PM, Patrick Mineault

Post by Ken Arnold
http://www.cs.toronto.edu/~jasper/bayesopt.pdf
https://github.com/JasperSnoek/spearmint

Post by Gael Varoquaux

Thanks a lot for your enthousiasme and suggestion.
Indeed, many of the core developpers would love to see simple Bayesian
optimization used for hyperparameter optimization, for instance taking
the gist of hyperopt https://github.com/hyperopt/hyperopt and making an
extended version of the RandomSearchCV.
However there are a number of technical roadblocks to get there. In
particular the Gaussian process could be improved (to implement
partial_fit for online learning), and the parallel computing engine
(joblib) does not support well as producer/consumer pattern. None of
these problems are showstoppers, but they reduce the usefulness of a
hyper-parameter selection object using Bayesian optimization.
I would hope that we find time to implement these difficult core aspects
and eventually get to implementing a more advanced hyper-parameter
optimizer. But all the core developers are very busy and spending a lot
of time simply maintaining the library (have a look at the number of
issues open or pull requests that are waiting to be reviewed to have an
idea).
If you want to help -beyond helping with reviewing/finishing pull
requests and closing issues, I suggest that first, to prototype code, you
could first submit an example using the Gaussian processes to do
optimization of a noisy function. In a second step, after having that
example merged, we could think about how to build a BayesianSearchCV
object.
Cheers,
Gaël
------------------------------------------------------------------------------
WatchGuard Dimension instantly turns raw network data into actionable
security intelligence. It gives you real-time visual feedback on key
security issues and trends. Skip the complicated setup - simply import
a virtual appliance and go from zero to informed in seconds.
http://pubads.g.doubleclick.net/gampad/clk?id=123612991&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

James Bergstra

2014-02-02 15:43:51 UTC

Permalink

Glad to see this thread revived!

Sklearn-users who are interested in this stuff should check out Hyperopt's
sklearn interface:

https://github.com/hyperopt/hyperopt-sklearn

It's very much a work-in-progress. We're in the process of putting together
some examples / tutorial, and a tech report that describes how well it
works, how long it takes, etc. The results we have so far are encouraging...

And speaking of results: we want to make the case that hyperopt-on-sklearn
is awesome, which requires showing that it works for lots of data sets. We
can only do so much on our own. Real use cases are a lot more interesting
than old standard benchmarks. If someone has a dataset and they'd like to
try hyper-optimizing their sklearn estimators & pre-processing stages, get
in touch! Send me a private message and we can work together to make sure
hyperopt-sklearn has what it takes for your application.

Also, hyperopt's got some new algorithms on the way too... but that'll be
the subject for another writeup.

- James

Post by FrÃ©dÃ©ric Bastien
thanks.
Fred
On Thu, Jan 30, 2014 at 8:28 PM, Patrick Mineault

Post by Ken Arnold
http://www.cs.toronto.edu/~jasper/bayesopt.pdf
https://github.com/JasperSnoek/spearmint

Post by Gael Varoquaux

Post by James Jensen
Bayesian optimization is an efficient method used especially for
functions that are expensive to evaluate. The basic idea is to fit

the