Discussion:
GSoC2015 Hyperparameter Optimization topic
(too old to reply)
Christof Angermueller
2015-03-07 12:39:42 UTC
Permalink
Hi Andreas (and others),

I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.

There are different approaches for hyperparameter optimization, some of
which you mentioned on the topics page:
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
* Tree-structured Parzen Estimator Approach (TPE) -> Hyperopt:
http://hyperopt.github.io/hyperopt/

And more recent approaches based on neural networks:
* Scalable Bayesian Optimization Using Deep Neural Networks Deep
Networks for Global Optimization (DNGO) -> http://arxiv.org/abs/1502.05700

The idea is to implement ONE of this approaches, right?

Do you prefer a particular approach due to theoretical or practical reasons?

Spearmint also supports distributing jobs on a cluster (SGE). I imagine
that this requires platform specific code, which could be difficult to
maintain. What do you think?

Spearmint and hyperopt are already established python packages. Another
sklearn implementation might be considered as redundant, are hard to
establish. Do you have a particular new feature in mind?


Cheers,
Christof
--
Christof Angermueller
***@gmail.com
http://cangermueller.com
Kyle Kastner
2015-03-07 14:06:57 UTC
Permalink
I think finding one method is indeed the goal. Even if it is not the best
every time, a 90% solution for 10% of the complexity would be awesome. I
think GPs with parameter space warping are *probably* the best solution but
only a good implementation will show for sure.

Spearmint and hyperopt exist and work for more complex stuff but with far
more moving parts and complexity. Having a tool which is easy to use as the
grid search and random search modules currently are would be a big benefit.

My .02c

Kyle
Post by Christof Angermueller
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my PhD, I
apply and develop different machine learning algorithms for analyzing
biological data.
There are different approaches for hyperparameter optimization, some of
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I imagine
that this requires platform specific code, which could be difficult to
maintain. What do you think?
Spearmint and hyperopt are already established python packages. Another
sklearn implementation might be considered as redundant, are hard to
establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for
all
things parallel software development, from weekly thought leadership blogs
to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Sturla Molden
2015-03-09 13:27:18 UTC
Permalink
For Bayesian optimization with MCMC (which I believe spearmint also
does) I have found that emcee is very nice:

http://dan.iel.fm/emcee/current/

It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the parameter
set (which can just as well be hyperparameters).

To do this computation in parallel one can simply evaluate the walkers
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the number of
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to ensure
that each thread (or process) gets an independent stream of random numbers.

emcee implements this kind of optimization using multiprocessing, but it
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each walker
in shared memory. So there is a lot of room for improvement here.


Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but with
far more moving parts and complexity. Having a tool which is easy to use
as the grid search and random search modules currently are would be a
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.
There are different approaches for hyperparameter optimization, some
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could be
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as redundant, are
hard to establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
Andreas Mueller
2015-03-09 14:07:54 UTC
Permalink
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the parameter
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the walkers
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the number of
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to ensure
that each thread (or process) gets an independent stream of random numbers.
emcee implements this kind of optimization using multiprocessing, but it
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each walker
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but with
far more moving parts and complexity. Having a tool which is easy to use
as the grid search and random search modules currently are would be a
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.
There are different approaches for hyperparameter optimization, some
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could be
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as redundant, are
hard to establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Jan Hendrik Metzen
2015-03-09 15:28:04 UTC
Permalink
A combination of emcee with GPs (in this case the GPs from george) is
described here:
http://dan.iel.fm/george/current/user/hyper/#sampling-marginalization
As PR #4270 for sklearn also exposes a method
log_marginal_likelihood(theta) in GaussianProcessRegressor, it should be
straight-forward to adapt this example to the PR. If we would use emcee
for the marginalization of the hyperparameters in Bayesian optimization,
we would of course add an additional dependency. As this is probably not
desirable, hyperparameter marginalization would require implementing a
separate MCMC sampler in sklearn, which could go beyond the scope of the
GSoC topic.
Post by Andreas Mueller
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the parameter
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the walkers
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the number of
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to ensure
that each thread (or process) gets an independent stream of random numbers.
emcee implements this kind of optimization using multiprocessing, but it
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each walker
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but with
far more moving parts and complexity. Having a tool which is easy to use
as the grid search and random search modules currently are would be a
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.
There are different approaches for hyperparameter optimization, some
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could be
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as redundant, are
hard to establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Jan Hendrik Metzen, Dr.rer.nat.
Team Leader of Team "Sustained Learning"

Universität Bremen und DFKI GmbH, Robotics Innovation Center
FB 3 - Mathematik und Informatik
AG Robotik
Robert-Hooke-Straße 1
28359 Bremen, Germany


Tel.: +49 421 178 45-4123
Zentrale: +49 421 178 45-6611
Fax: +49 421 178 45-4150
E-Mail: ***@informatik.uni-bremen.de
Homepage: http://www.informatik.uni-bremen.de/~jhm/

Weitere Informationen: http://www.informatik.uni-bremen.de/robotik
Andreas Mueller
2015-03-09 15:35:16 UTC
Permalink
Yeah, I don't think we want to include that in the scope of the GSoC.
Using MLE parameters still works, just converges a bit slower.
Post by Jan Hendrik Metzen
A combination of emcee with GPs (in this case the GPs from george) is
http://dan.iel.fm/george/current/user/hyper/#sampling-marginalization
As PR #4270 for sklearn also exposes a method
log_marginal_likelihood(theta) in GaussianProcessRegressor, it should be
straight-forward to adapt this example to the PR. If we would use emcee
for the marginalization of the hyperparameters in Bayesian optimization,
we would of course add an additional dependency. As this is probably not
desirable, hyperparameter marginalization would require implementing a
separate MCMC sampler in sklearn, which could go beyond the scope of the
GSoC topic.
Post by Andreas Mueller
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the parameter
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the walkers
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the number of
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to ensure
that each thread (or process) gets an independent stream of random numbers.
emcee implements this kind of optimization using multiprocessing, but it
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each walker
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but with
far more moving parts and complexity. Having a tool which is easy to use
as the grid search and random search modules currently are would be a
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.
There are different approaches for hyperparameter optimization, some
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical
reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could be
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as redundant, are
hard to establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Christof Angermueller
2015-03-09 21:24:30 UTC
Permalink
I agree with Kyle: an efficient, easy to use hyperparameter optimization
module that is consistent with the sklearn framework would be an
advantage over existing packages. In term of efficiency, I would start
with ML estimation or variational inference instead of (slower) sampling.

I will read more about the pros and cons of GP, SMBO, etc., the next days.

Is there currently an open issue, such that I can submit some patches?
#4354?

Christof
Post by Andreas Mueller
Yeah, I don't think we want to include that in the scope of the GSoC.
Using MLE parameters still works, just converges a bit slower.
Post by Jan Hendrik Metzen
A combination of emcee with GPs (in this case the GPs from george) is
http://dan.iel.fm/george/current/user/hyper/#sampling-marginalization
As PR #4270 for sklearn also exposes a method
log_marginal_likelihood(theta) in GaussianProcessRegressor, it should be
straight-forward to adapt this example to the PR. If we would use emcee
for the marginalization of the hyperparameters in Bayesian optimization,
we would of course add an additional dependency. As this is probably not
desirable, hyperparameter marginalization would require implementing a
separate MCMC sampler in sklearn, which could go beyond the scope of the
GSoC topic.
Post by Andreas Mueller
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the parameter
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the walkers
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the number of
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to ensure
that each thread (or process) gets an independent stream of random numbers.
emcee implements this kind of optimization using multiprocessing, but it
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each walker
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but with
far more moving parts and complexity. Having a tool which is easy to use
as the grid search and random search modules currently are would be a
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.
There are different approaches for hyperparameter optimization, some
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical
reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could be
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as redundant, are
hard to establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
***@gmail.com
http://cangermueller.com
Andreas Mueller
2015-03-09 22:26:37 UTC
Permalink
Post by Christof Angermueller
Is there currently an open issue, such that I can submit some patches?
#4354?
ragv might be working on this, not sure. I promised work but didn't
deliver ;)
You mean like general open issues? We have 341 of those:
https://github.com/scikit-learn/scikit-learn/issues
A lot of the newer 'easy' issues are already worked on, though.
How about this one:
https://github.com/scikit-learn/scikit-learn/issues/1963
Ronnie Ghose
2015-03-09 22:29:29 UTC
Permalink
@andreas

i dont know if this is already a thing - but how about a (soft?)
requirement that new pull reqs are pep8 compliant (within reason)
Post by Andreas Mueller
Post by Christof Angermueller
Is there currently an open issue, such that I can submit some patches?
#4354?
ragv might be working on this, not sure. I promised work but didn't
deliver ;)
https://github.com/scikit-learn/scikit-learn/issues
A lot of the newer 'easy' issues are already worked on, though.
https://github.com/scikit-learn/scikit-learn/issues/1963
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2015-03-09 22:42:35 UTC
Permalink
We wanted a bot that tells us about violations on PRs.
Not sure if landscape.io can provide that:\
https://github.com/scikit-learn/scikit-learn/issues/3888#issuecomment-76037183

ragv also looked into this, I think.
Not necessary a binary "fail/pass" but more like a report by a bot.
Post by Ronnie Ghose
@andreas
i dont know if this is already a thing - but how about a (soft?)
requirement that new pull reqs are pep8 compliant (within reason)
Post by Christof Angermueller
Is there currently an open issue, such that I can submit some
patches?
Post by Christof Angermueller
#4354?
ragv might be working on this, not sure. I promised work but didn't
deliver ;)
https://github.com/scikit-learn/scikit-learn/issues
A lot of the newer 'easy' issues are already worked on, though.
https://github.com/scikit-learn/scikit-learn/issues/1963
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Saket Choudhary
2015-03-09 23:11:03 UTC
Permalink
Post by Andreas Mueller
We wanted a bot that tells us about violations on PRs.
Not sure if landscape.io can provide that:\
https://github.com/scikit-learn/scikit-learn/issues/3888#issuecomment-76037183
ragv also looked into this, I think.
Not necessary a binary "fail/pass" but more like a report by a bot.
It does.
http://blog.landscape.io/pull-request-comparisons-merge-with-confidence.html
Post by Andreas Mueller
@andreas
i dont know if this is already a thing - but how about a (soft?) requirement
that new pull reqs are pep8 compliant (within reason)
Post by Andreas Mueller
Post by Christof Angermueller
Is there currently an open issue, such that I can submit some patches?
#4354?
ragv might be working on this, not sure. I promised work but didn't
deliver ;)
https://github.com/scikit-learn/scikit-learn/issues
A lot of the newer 'easy' issues are already worked on, though.
https://github.com/scikit-learn/scikit-learn/issues/1963
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andy
2015-03-10 00:53:59 UTC
Permalink
Post by Saket Choudhary
Post by Andreas Mueller
We wanted a bot that tells us about violations on PRs.
Not sure if landscape.io can provide that:\
https://github.com/scikit-learn/scikit-learn/issues/3888#issuecomment-76037183
ragv also looked into this, I think.
Not necessary a binary "fail/pass" but more like a report by a bot.
It does.
http://blog.landscape.io/pull-request-comparisons-merge-with-confidence.html
Sweet, I'll look into it.
Sturla Molden
2015-03-10 15:27:53 UTC
Permalink
Post by Andreas Mueller
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Just pick the mode :-)

The distribution is whatever you want it to be.

Sturla
Post by Andreas Mueller
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the parameter
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the walkers
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the number of
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to ensure
that each thread (or process) gets an independent stream of random numbers.
emcee implements this kind of optimization using multiprocessing, but it
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each walker
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but with
far more moving parts and complexity. Having a tool which is easy to use
as the grid search and random search modules currently are would be a
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.
There are different approaches for hyperparameter optimization, some
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could be
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as redundant, are
hard to establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
Christof Angermueller
2015-03-11 21:40:22 UTC
Permalink
I will have a closer look at the different optimization approaches and
start to work on an outline for this topic.

Does anybody know of further optimization approaches that were not
mentioned below and that we could consider?
Is there anybody else interested in this topic?

Christof
Post by Sturla Molden
Post by Andreas Mueller
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Just pick the mode :-)
The distribution is whatever you want it to be.
Sturla
Post by Andreas Mueller
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the parameter
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the walkers
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the number of
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to ensure
that each thread (or process) gets an independent stream of random numbers.
emcee implements this kind of optimization using multiprocessing, but it
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each walker
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but with
far more moving parts and complexity. Having a tool which is easy to use
as the grid search and random search modules currently are would be a
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.
There are different approaches for hyperparameter optimization, some
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical
reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could be
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as redundant, are
hard to establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
***@gmail.com
http://cangermueller.com
Gael Varoquaux
2015-03-19 21:47:34 UTC
Permalink
Post by Christof Angermueller
Does anybody know of further optimization approaches that were not
mentioned below and that we could consider?
Maybe parallel computing. A grid search is an embarrassingly parallel
problem. A Bayesian optimization is not. We have the necessary framework
only to tackle embarrassingly parallel computing. Maybe the first 10
shots can be done with random sampling (using code adapted / subclassed
from the RandomSearch) before kicking in the Bayesian optimizer.

Gaël
Christof Angermueller
2015-03-19 21:12:05 UTC
Permalink
Hi All,

you can find my proposal for the hyperparameter optimization topic here:
* http://goo.gl/XHuav8
*
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing

Please give feedback!

Cheers,
Christof
Post by Sturla Molden
Post by Andreas Mueller
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Just pick the mode :-)
The distribution is whatever you want it to be.
Sturla
Post by Andreas Mueller
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the parameter
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the walkers
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the number of
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to ensure
that each thread (or process) gets an independent stream of random numbers.
emcee implements this kind of optimization using multiprocessing, but it
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each walker
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but with
far more moving parts and complexity. Having a tool which is easy to use
as the grid search and random search modules currently are would be a
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.
There are different approaches for hyperparameter optimization, some
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical
reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could be
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as redundant, are
hard to establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
***@gmail.com
http://cangermueller.com
Charles Martin
2015-03-19 21:16:38 UTC
Permalink
I would like to propose extending the linearSVC package
by replacing the liblinear version with a newer version that

1. allows setting instance weights
2. provides the dual variables /Lagrange multipliers

This would facilitate research and development of transductive SVMs
and related semi-supervised methods.


Charles H Martin, PhD



On Thu, Mar 19, 2015 at 2:12 PM, Christof Angermueller
Post by Christof Angermueller
Hi All,
* http://goo.gl/XHuav8
*
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing
Please give feedback!
Cheers,
Christof
Post by Sturla Molden
Post by Andreas Mueller
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Just pick the mode :-)
The distribution is whatever you want it to be.
Sturla
Post by Andreas Mueller
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the parameter
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the walkers
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the number of
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to ensure
that each thread (or process) gets an independent stream of random numbers.
emcee implements this kind of optimization using multiprocessing, but it
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each walker
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but with
far more moving parts and complexity. Having a tool which is easy to use
as the grid search and random search modules currently are would be a
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.
There are different approaches for hyperparameter optimization, some
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical
reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could be
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as redundant, are
hard to establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
This e-mail message, and any attachments, is intended only for the use
of the individual or entity identified in the alias address of this
message and may contain information that is confidential, privileged
and subject to legal restrictions and penalties regarding its
unauthorized disclosure and use. Any unauthorized review, copying,
disclosure, use or distribution is strictly prohibited. If you have
received this e-mail message in error, please notify the sender
immediately by reply e-mail and delete this message, and any
attachments, from your system. Thank you.
Andreas Mueller
2015-03-19 21:36:24 UTC
Permalink
Hi Charles.
That is unrelated to the GSoC mail you responded to, right?

I think updating liblinear sound like a good idea, if it doesn't end up
being to complicated.
Allowing instance weights is certainly something we'd like to have.
You should check how far our code diverged, but I think for liblinear it
is not as bad as for libsvm.
Feel free to submit a pull request after going through the contributor
guidelines:
http://scikit-learn.org/dev/developers/index.html#contributing-code

Cheers,
Andy
Post by Charles Martin
I would like to propose extending the linearSVC package
by replacing the liblinear version with a newer version that
1. allows setting instance weights
2. provides the dual variables /Lagrange multipliers
This would facilitate research and development of transductive SVMs
and related semi-supervised methods.
Charles H Martin, PhD
On Thu, Mar 19, 2015 at 2:12 PM, Christof Angermueller
Post by Christof Angermueller
Hi All,
* http://goo.gl/XHuav8
*
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing
Please give feedback!
Cheers,
Christof
Post by Sturla Molden
Post by Andreas Mueller
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Just pick the mode :-)
The distribution is whatever you want it to be.
Sturla
Post by Andreas Mueller
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the parameter
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the walkers
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the number of
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to ensure
that each thread (or process) gets an independent stream of random numbers.
emcee implements this kind of optimization using multiprocessing, but it
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each walker
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but with
far more moving parts and complexity. Having a tool which is easy to use
as the grid search and random search modules currently are would be a
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.
There are different approaches for hyperparameter optimization, some
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical
reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could be
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as redundant, are
hard to establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Charles Martin
2015-03-19 22:12:02 UTC
Permalink
Yes and thanks

Sent from my iPhone
Post by Andreas Mueller
Hi Charles.
That is unrelated to the GSoC mail you responded to, right?
I think updating liblinear sound like a good idea, if it doesn't end up
being to complicated.
Allowing instance weights is certainly something we'd like to have.
You should check how far our code diverged, but I think for liblinear it
is not as bad as for libsvm.
Feel free to submit a pull request after going through the contributor
http://scikit-learn.org/dev/developers/index.html#contributing-code
Cheers,
Andy
Post by Charles Martin
I would like to propose extending the linearSVC package
by replacing the liblinear version with a newer version that
1. allows setting instance weights
2. provides the dual variables /Lagrange multipliers
This would facilitate research and development of transductive SVMs
and related semi-supervised methods.
Charles H Martin, PhD
On Thu, Mar 19, 2015 at 2:12 PM, Christof Angermueller
Post by Christof Angermueller
Hi All,
* http://goo.gl/XHuav8
*
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing
Please give feedback!
Cheers,
Christof
Post by Sturla Molden
Post by Andreas Mueller
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Just pick the mode :-)
The distribution is whatever you want it to be.
Sturla
Post by Andreas Mueller
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the parameter
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the walkers
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the number of
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to ensure
that each thread (or process) gets an independent stream of random numbers.
emcee implements this kind of optimization using multiprocessing, but it
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each walker
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but with
far more moving parts and complexity. Having a tool which is easy to use
as the grid search and random search modules currently are would be a
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.
There are different approaches for hyperparameter optimization, some
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical
reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could be
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as redundant, are
hard to establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Joel Nothman
2015-03-19 22:20:38 UTC
Permalink
I should have replied here. Liblinear with sample weights:
https://github.com/scikit-learn/scikit-learn/pull/2784
Post by Charles Martin
Yes and thanks
Sent from my iPhone
Post by Andreas Mueller
Hi Charles.
That is unrelated to the GSoC mail you responded to, right?
I think updating liblinear sound like a good idea, if it doesn't end up
being to complicated.
Allowing instance weights is certainly something we'd like to have.
You should check how far our code diverged, but I think for liblinear it
is not as bad as for libsvm.
Feel free to submit a pull request after going through the contributor
http://scikit-learn.org/dev/developers/index.html#contributing-code
Cheers,
Andy
Post by Charles Martin
I would like to propose extending the linearSVC package
by replacing the liblinear version with a newer version that
1. allows setting instance weights
2. provides the dual variables /Lagrange multipliers
This would facilitate research and development of transductive SVMs
and related semi-supervised methods.
Charles H Martin, PhD
On Thu, Mar 19, 2015 at 2:12 PM, Christof Angermueller
Post by Christof Angermueller
Hi All,
you can find my proposal for the hyperparameter optimization topic
* http://goo.gl/XHuav8
*
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Please give feedback!
Cheers,
Christof
Post by Sturla Molden
Post by Andreas Mueller
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Just pick the mode :-)
The distribution is whatever you want it to be.
Sturla
Post by Andreas Mueller
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the
parameter
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the
walkers
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the
number of
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
walkers as needed. Also one should use something like DCMT for
random
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
numbers to make sure there are no contention for the PRNG and to
ensure
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
that each thread (or process) gets an independent stream of random
numbers.
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
emcee implements this kind of optimization using multiprocessing,
but it
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each
walker
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not
the
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably*
the
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but
with
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
far more moving parts and complexity. Having a tool which is easy
to use
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
as the grid search and random search modules currently are would
be a
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of
Cambridge,
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
(EBI/EMBL), supervised by Oliver Stegle and Zoubin
Ghahramani. In my
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
PhD, I apply and develop different machine learning
algorithms for
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
analyzing biological data.
There are different approaches for hyperparameter
optimization, some
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
* Tree-structured Parzen Estimator Approach (TPE) ->
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or
practical
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
reasons?
Spearmint also supports distributing jobs on a cluster
(SGE). I
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
imagine that this requires platform specific code, which
could be
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python
packages.
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
Another sklearn implementation might be considered as
redundant, are
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
hard to establish. Do you have a particular new feature in
mind?
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
Dive into the World of Parallel Programming The Go Parallel
Website,
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
sponsored
by Intel and developed in partnership with Slashdot Media,
is your
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
hub for all
things parallel software development, from weekly thought
leadership
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
blogs to
news, videos, case studies, tutorials and more. Take a look
and join the
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
Dive into the World of Parallel Programming The Go Parallel
Website, sponsored
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
by Intel and developed in partnership with Slashdot Media, is your
hub for all
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
things parallel software development, from weekly thought
leadership blogs to
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
news, videos, case studies, tutorials and more. Take a look and
join the
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Dive into the World of Parallel Programming The Go Parallel
Website, sponsored
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
by Intel and developed in partnership with Slashdot Media, is your
hub for all
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
things parallel software development, from weekly thought
leadership blogs to
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
news, videos, case studies, tutorials and more. Take a look and
join the
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
by Intel and developed in partnership with Slashdot Media, is your
hub for all
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
things parallel software development, from weekly thought leadership
blogs to
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
news, videos, case studies, tutorials and more. Take a look and join
the
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
by Intel and developed in partnership with Slashdot Media, is your
hub for all
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
things parallel software development, from weekly thought leadership
blogs to
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
news, videos, case studies, tutorials and more. Take a look and join
the
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Post by Sturla Molden
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
by Intel and developed in partnership with Slashdot Media, is your hub
for all
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
things parallel software development, from weekly thought leadership
blogs to
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
news, videos, case studies, tutorials and more. Take a look and join
the
Post by Andreas Mueller
Post by Charles Martin
Post by Christof Angermueller
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Andreas Mueller
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Andreas Mueller
by Intel and developed in partnership with Slashdot Media, is your hub
for all
Post by Andreas Mueller
things parallel software development, from weekly thought leadership
blogs to
Post by Andreas Mueller
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Charles Martin
2015-03-19 22:46:44 UTC
Permalink
oh thanks
Post by Joel Nothman
https://github.com/scikit-learn/scikit-learn/pull/2784
Post by Charles Martin
Yes and thanks
Sent from my iPhone
Post by Andreas Mueller
Hi Charles.
That is unrelated to the GSoC mail you responded to, right?
I think updating liblinear sound like a good idea, if it doesn't end up
being to complicated.
Allowing instance weights is certainly something we'd like to have.
You should check how far our code diverged, but I think for liblinear it
is not as bad as for libsvm.
Feel free to submit a pull request after going through the contributor
http://scikit-learn.org/dev/developers/index.html#contributing-code
Cheers,
Andy
Post by Charles Martin
I would like to propose extending the linearSVC package
by replacing the liblinear version with a newer version that
1. allows setting instance weights
2. provides the dual variables /Lagrange multipliers
This would facilitate research and development of transductive SVMs
and related semi-supervised methods.
Charles H Martin, PhD
On Thu, Mar 19, 2015 at 2:12 PM, Christof Angermueller
Post by Christof Angermueller
Hi All,
* http://goo.gl/XHuav8
*
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing
Please give feedback!
Cheers,
Christof
Post by Sturla Molden
Post by Andreas Mueller
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Just pick the mode :-)
The distribution is whatever you want it to be.
Sturla
Post by Andreas Mueller
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the parameter
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the walkers
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the number of
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to ensure
that each thread (or process) gets an independent stream of random numbers.
emcee implements this kind of optimization using multiprocessing, but it
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each walker
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but with
far more moving parts and complexity. Having a tool which is easy to use
as the grid search and random search modules currently are would be a
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of
Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin
Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.
There are different approaches for hyperparameter
optimization, some
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical
reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could be
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as
redundant, are
hard to establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel
Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought
leadership
blogs to
news, videos, case studies, tutorials and more. Take a look
and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel
Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel
Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
This e-mail message, and any attachments, is intended only for the use
of the individual or entity identified in the alias address of this
message and may contain information that is confidential, privileged
and subject to legal restrictions and penalties regarding its
unauthorized disclosure and use. Any unauthorized review, copying,
disclosure, use or distribution is strictly prohibited. If you have
received this e-mail message in error, please notify the sender
immediately by reply e-mail and delete this message, and any
attachments, from your system. Thank you.
Charles Martin
2015-03-24 00:05:13 UTC
Permalink
On liblinear--can you clarify for me how you incorporate updates from
the main site?

Do you make an effort to stay up to date with latest changes directly
by recompiling liblinear each time a new release is made?
Andreas Mueller
2015-03-24 00:08:52 UTC
Permalink
I am not aware of anyone tracking liblinear.
There is certainly no automatic update.
Post by Charles Martin
On liblinear--can you clarify for me how you incorporate updates from
the main site?
Do you make an effort to stay up to date with latest changes directly
by recompiling liblinear each time a new release is made?
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Joel Nothman
2015-03-19 22:20:05 UTC
Permalink
This is off-topic, but I should note that there is a patch at
https://github.com/scikit-learn/scikit-learn/pull/2784 awaiting review for
a while now...
Post by Charles Martin
I would like to propose extending the linearSVC package
by replacing the liblinear version with a newer version that
1. allows setting instance weights
2. provides the dual variables /Lagrange multipliers
This would facilitate research and development of transductive SVMs
and related semi-supervised methods.
Charles H Martin, PhD
On Thu, Mar 19, 2015 at 2:12 PM, Christof Angermueller
Post by Christof Angermueller
Hi All,
* http://goo.gl/XHuav8
*
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing
Post by Christof Angermueller
Please give feedback!
Cheers,
Christof
Post by Sturla Molden
Post by Andreas Mueller
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Just pick the mode :-)
The distribution is whatever you want it to be.
Sturla
Post by Andreas Mueller
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the parameter
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the walkers
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the number
of
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to
ensure
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
that each thread (or process) gets an independent stream of random
numbers.
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
emcee implements this kind of optimization using multiprocessing, but
it
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each
walker
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but with
far more moving parts and complexity. Having a tool which is easy to
use
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
as the grid search and random search modules currently are would be a
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of
Cambridge,
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani.
In my
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
PhD, I apply and develop different machine learning algorithms
for
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
analyzing biological data.
There are different approaches for hyperparameter
optimization, some
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or
practical
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could
be
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as
redundant, are
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
hard to establish. Do you have a particular new feature in
mind?
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
Dive into the World of Parallel Programming The Go Parallel
Website,
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
sponsored
by Intel and developed in partnership with Slashdot Media, is
your
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
hub for all
things parallel software development, from weekly thought
leadership
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
blogs to
news, videos, case studies, tutorials and more. Take a look
and join the
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
by Intel and developed in partnership with Slashdot Media, is your
hub for all
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
things parallel software development, from weekly thought leadership
blogs to
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
news, videos, case studies, tutorials and more. Take a look and join
the
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
by Intel and developed in partnership with Slashdot Media, is your
hub for all
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
things parallel software development, from weekly thought leadership
blogs to
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
news, videos, case studies, tutorials and more. Take a look and join
the
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
by Intel and developed in partnership with Slashdot Media, is your hub
for all
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
things parallel software development, from weekly thought leadership
blogs to
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
news, videos, case studies, tutorials and more. Take a look and join
the
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Post by Christof Angermueller
Post by Sturla Molden
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Christof Angermueller
Post by Sturla Molden
by Intel and developed in partnership with Slashdot Media, is your hub
for all
Post by Christof Angermueller
Post by Sturla Molden
things parallel software development, from weekly thought leadership
blogs to
Post by Christof Angermueller
Post by Sturla Molden
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Post by Christof Angermueller
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Christof Angermueller
by Intel and developed in partnership with Slashdot Media, is your hub
for all
Post by Christof Angermueller
things parallel software development, from weekly thought leadership
blogs to
Post by Christof Angermueller
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
This e-mail message, and any attachments, is intended only for the use
of the individual or entity identified in the alias address of this
message and may contain information that is confidential, privileged
and subject to legal restrictions and penalties regarding its
unauthorized disclosure and use. Any unauthorized review, copying,
disclosure, use or distribution is strictly prohibited. If you have
received this e-mail message in error, please notify the sender
immediately by reply e-mail and delete this message, and any
attachments, from your system. Thank you.
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2015-03-23 20:40:44 UTC
Permalink
Hi Christof.
Can you please also post it on melange?
Reviews will be coming soon ;)
Andy
Post by Christof Angermueller
Hi All,
* http://goo.gl/XHuav8
*
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing
Please give feedback!
Cheers,
Christof
Post by Sturla Molden
Post by Andreas Mueller
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Just pick the mode :-)
The distribution is whatever you want it to be.
Sturla
Post by Andreas Mueller
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the parameter
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the walkers
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the number of
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to ensure
that each thread (or process) gets an independent stream of random numbers.
emcee implements this kind of optimization using multiprocessing, but it
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each walker
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but with
far more moving parts and complexity. Having a tool which is easy to use
as the grid search and random search modules currently are would be a
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.
There are different approaches for hyperparameter optimization, some
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical
reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could be
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as redundant, are
hard to establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2015-03-24 00:07:30 UTC
Permalink
Hi Christof.
I gave some comments on the google doc.

Andy
Post by Christof Angermueller
Hi All,
* http://goo.gl/XHuav8
*
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing
Please give feedback!
Cheers,
Christof
Post by Sturla Molden
Post by Andreas Mueller
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Just pick the mode :-)
The distribution is whatever you want it to be.
Sturla
Post by Andreas Mueller
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the parameter
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the walkers
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the number of
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to ensure
that each thread (or process) gets an independent stream of random numbers.
emcee implements this kind of optimization using multiprocessing, but it
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each walker
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but with
far more moving parts and complexity. Having a tool which is easy to use
as the grid search and random search modules currently are would be a
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.
There are different approaches for hyperparameter optimization, some
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical
reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could be
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as redundant, are
hard to establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Christof Angermueller
2015-03-24 08:31:10 UTC
Permalink
thanks Andy! I will revise my proposal and submit it to melange today!

Christof
Post by Andy
Hi Christof.
I gave some comments on the google doc.
Andy
Post by Christof Angermueller
Hi All,
* http://goo.gl/XHuav8
*
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing
Please give feedback!
Cheers,
Christof
Post by Sturla Molden
Post by Andreas Mueller
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Just pick the mode :-)
The distribution is whatever you want it to be.
Sturla
Post by Andreas Mueller
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the parameter
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the walkers
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the number of
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to ensure
that each thread (or process) gets an independent stream of random numbers.
emcee implements this kind of optimization using multiprocessing, but it
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each walker
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but with
far more moving parts and complexity. Having a tool which is easy to use
as the grid search and random search modules currently are would be a
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.
There are different approaches for hyperparameter optimization, some
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical
reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could be
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as redundant, are
hard to establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
***@gmail.com
http://cangermueller.com
Christof Angermueller
2015-03-24 20:38:57 UTC
Permalink
Thanks Andy! I replied to your comments:
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing.

I summary,
* I will not mentioned parallelization as an extended features,
* suggest concrete data sets for benchmarking,
* mentioned tasks for which I expect an improvement.

Any further ideas?
Where can I find the PR for gaussian_processes? I would like to know
about what will be implemented and to which extend I can contribute.

I will upload the final version to melange tomorrow.


Cheers,
Christof


Any further ideas on
Post by Andy
Hi Christof.
I gave some comments on the google doc.
Andy
Post by Christof Angermueller
Hi All,
* http://goo.gl/XHuav8
*
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing
Please give feedback!
Cheers,
Christof
Post by Sturla Molden
Post by Andreas Mueller
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Just pick the mode :-)
The distribution is whatever you want it to be.
Sturla
Post by Andreas Mueller
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the parameter
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the walkers
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the number of
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to ensure
that each thread (or process) gets an independent stream of random numbers.
emcee implements this kind of optimization using multiprocessing, but it
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each walker
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but with
far more moving parts and complexity. Having a tool which is easy to use
as the grid search and random search modules currently are would be a
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.
There are different approaches for hyperparameter optimization, some
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical
reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could be
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as redundant, are
hard to establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
by Intel and developed in partnership with Slashdot Media, is your
hub for all
things parallel software development, from weekly thought leadership
blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
***@gmail.com
http://cangermueller.com
Michael Eickenberg
2015-03-24 20:45:59 UTC
Permalink
On Tue, Mar 24, 2015 at 9:38 PM, Christof Angermueller <
Post by Christof Angermueller
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing
.
I summary,
* I will not mentioned parallelization as an extended features,
* suggest concrete data sets for benchmarking,
* mentioned tasks for which I expect an improvement.
Any further ideas?
Where can I find the PR for gaussian_processes? I would like to know
about what will be implemented and to which extend I can contribute.
https://github.com/scikit-learn/scikit-learn/pull/4270/
Post by Christof Angermueller
I will upload the final version to melange tomorrow.
Cheers,
Christof
Any further ideas on
Post by Andy
Hi Christof.
I gave some comments on the google doc.
Andy
Post by Christof Angermueller
Hi All,
* http://goo.gl/XHuav8
*
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing
Post by Andy
Post by Christof Angermueller
Please give feedback!
Cheers,
Christof
Post by Sturla Molden
Post by Andreas Mueller
Does emcee implement Bayesian optimization?
What is the distribution you assume? GPs?
I thought emcee was a sampler. I need to check in with Dan ;)
Just pick the mode :-)
The distribution is whatever you want it to be.
Sturla
Post by Andreas Mueller
Post by Sturla Molden
For Bayesian optimization with MCMC (which I believe spearmint also
http://dan.iel.fm/emcee/current/
It is much faster than naïve MCMC methods and all we need to do is
compute a callback that computes the loglikelihood given the
parameter
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
set (which can just as well be hyperparameters).
To do this computation in parallel one can simply evaluate the
walkers
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
in parallel and do a barrier synchronization after each step. The
contention due to the barrier can be reduced by increasing the
number of
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
walkers as needed. Also one should use something like DCMT for random
numbers to make sure there are no contention for the PRNG and to
ensure
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
that each thread (or process) gets an independent stream of random
numbers.
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
emcee implements this kind of optimization using multiprocessing,
but it
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
passes parameter sets around using pickle and is therefore not very
efficient compared to just storing the current parameter for each
walker
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
in shared memory. So there is a lot of room for improvement here.
Sturla
Post by Kyle Kastner
I think finding one method is indeed the goal. Even if it is not the
best every time, a 90% solution for 10% of the complexity would be
awesome. I think GPs with parameter space warping are *probably* the
best solution but only a good implementation will show for sure.
Spearmint and hyperopt exist and work for more complex stuff but
with
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
far more moving parts and complexity. Having a tool which is easy
to use
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
as the grid search and random search modules currently are would be
a
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
big benefit.
My .02c
Kyle
On Mar 7, 2015 7:48 AM, "Christof Angermueller"
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of
Cambridge,
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
(EBI/EMBL), supervised by Oliver Stegle and Zoubin
Ghahramani. In my
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
PhD, I apply and develop different machine learning
algorithms for
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
analyzing biological data.
There are different approaches for hyperparameter
optimization, some
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
* Tree-structured Parzen Estimator Approach (TPE) ->
http://hyperopt.github.io/hyperopt/
* Deep Networks for Global Optimization (DNGO) ->
http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or
practical
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
reasons?
Spearmint also supports distributing jobs on a cluster
(SGE). I
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
imagine that this requires platform specific code, which
could be
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python
packages.
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
Another sklearn implementation might be considered as
redundant, are
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
hard to establish. Do you have a particular new feature in
mind?
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
Dive into the World of Parallel Programming The Go Parallel
Website,
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
sponsored
by Intel and developed in partnership with Slashdot Media,
is your
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
hub for all
things parallel software development, from weekly thought
leadership
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
blogs to
news, videos, case studies, tutorials and more. Take a look
and join the
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
Dive into the World of Parallel Programming The Go Parallel
Website, sponsored
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
by Intel and developed in partnership with Slashdot Media, is your
hub for all
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
things parallel software development, from weekly thought
leadership blogs to
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
news, videos, case studies, tutorials and more. Take a look and
join the
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Post by Kyle Kastner
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
by Intel and developed in partnership with Slashdot Media, is your
hub for all
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
things parallel software development, from weekly thought leadership
blogs to
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
news, videos, case studies, tutorials and more. Take a look and join
the
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Post by Sturla Molden
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
by Intel and developed in partnership with Slashdot Media, is your
hub for all
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
things parallel software development, from weekly thought leadership
blogs to
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
news, videos, case studies, tutorials and more. Take a look and join
the
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Post by Andreas Mueller
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------------
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
by Intel and developed in partnership with Slashdot Media, is your hub
for all
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
things parallel software development, from weekly thought leadership
blogs to
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
news, videos, case studies, tutorials and more. Take a look and join
the
Post by Andy
Post by Christof Angermueller
Post by Sturla Molden
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Andy
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Andy
by Intel and developed in partnership with Slashdot Media, is your hub
for all
Post by Andy
things parallel software development, from weekly thought leadership
blogs to
Post by Andy
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andy
2015-03-24 20:52:27 UTC
Permalink
Post by Christof Angermueller
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing.
I summary,
* I will not mentioned parallelization as an extended features,
* suggest concrete data sets for benchmarking,
* mentioned tasks for which I expect an improvement.
It is also important to have algorithms for which we expect improvements.
I'm not sure how much we want to focus on deep learning, as the MLP is
not merged.
Post by Christof Angermueller
Any further ideas?
Where can I find the PR for gaussian_processes? I would like to know
about what will be implemented and to which extend I can contribute.
As much as you want ;)
Kyle Kastner
2015-03-24 21:01:35 UTC
Permalink
It might be nice to talk about optimizing runtime and/or training time
like SMAC did in their paper. I don't see any reason we couldn't do
this in sklearn, and it might be of value to users since we don't
really do deep learning as Andy said.
Post by Andy
Post by Christof Angermueller
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing.
I summary,
* I will not mentioned parallelization as an extended features,
* suggest concrete data sets for benchmarking,
* mentioned tasks for which I expect an improvement.
It is also important to have algorithms for which we expect improvements.
I'm not sure how much we want to focus on deep learning, as the MLP is
not merged.
Post by Christof Angermueller
Any further ideas?
Where can I find the PR for gaussian_processes? I would like to know
about what will be implemented and to which extend I can contribute.
As much as you want ;)
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Kyle Kastner
2015-03-24 21:08:19 UTC
Permalink
That said, I would think random forests would get a lot of the
benefits that deep learning tasks might get, since they also have a
lot of hyperparameters. Boosting tasks would be interesting as well,
since swapping the estimator used could make a huge difference, though
that may be trickier to implement.
Post by Kyle Kastner
It might be nice to talk about optimizing runtime and/or training time
like SMAC did in their paper. I don't see any reason we couldn't do
this in sklearn, and it might be of value to users since we don't
really do deep learning as Andy said.
Post by Andy
Post by Christof Angermueller
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing.
I summary,
* I will not mentioned parallelization as an extended features,
* suggest concrete data sets for benchmarking,
* mentioned tasks for which I expect an improvement.
It is also important to have algorithms for which we expect improvements.
I'm not sure how much we want to focus on deep learning, as the MLP is
not merged.
Post by Christof Angermueller
Any further ideas?
Where can I find the PR for gaussian_processes? I would like to know
about what will be implemented and to which extend I can contribute.
As much as you want ;)
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Kyle Kastner
2015-03-24 21:11:01 UTC
Permalink
This paper (http://arxiv.org/pdf/1306.3476v1.pdf) might also give you
some ideas for things to try. Boosting an untrained "deep" model got a
lot of benefit from bayesian optimization. Note that this model was
built prior to the release of the dataset! Weird but very interesting.
Post by Kyle Kastner
That said, I would think random forests would get a lot of the
benefits that deep learning tasks might get, since they also have a
lot of hyperparameters. Boosting tasks would be interesting as well,
since swapping the estimator used could make a huge difference, though
that may be trickier to implement.
Post by Kyle Kastner
It might be nice to talk about optimizing runtime and/or training time
like SMAC did in their paper. I don't see any reason we couldn't do
this in sklearn, and it might be of value to users since we don't
really do deep learning as Andy said.
Post by Andy
Post by Christof Angermueller
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing.
I summary,
* I will not mentioned parallelization as an extended features,
* suggest concrete data sets for benchmarking,
* mentioned tasks for which I expect an improvement.
It is also important to have algorithms for which we expect improvements.
I'm not sure how much we want to focus on deep learning, as the MLP is
not merged.
Post by Christof Angermueller
Any further ideas?
Where can I find the PR for gaussian_processes? I would like to know
about what will be implemented and to which extend I can contribute.
As much as you want ;)
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andy
2015-03-24 21:25:12 UTC
Permalink
One thing that might also be interesting is "Bootstrapping" (in the
compiler sense, not the statistics sense) the optimizer.
The latest Jasper Snoek paper http://arxiv.org/abs/1502.05700 they used
a hyper-parameter optimizer to optimize the parameter
of a hyper-parameter optimizer on a set of optimization tasks.



So we could try to optimize the parameters of the GP using the GP :)
Christof Angermueller
2015-03-24 22:01:00 UTC
Permalink
Don't you think that I could also benchmark models that are not
implemented in sklearn? For instance, I could write a wrapper
DeepNet(...) with fit() and predict(), and which uses internally theano
to build a ANN? In this way, I could benchmark complex deep networks
beyond what will be possible with the new sklearn ANN module. This might
be interesting for the deep learning community.

Obvious sklearn modules to benchmark are:
* RandomForestClassifier
* SVC
* GaussianProcess
* Perceptron

As benchmark data sets, I would use those that were used before (see
Snoek at al 2012, Bergstra et at 2011) to evaluate optimizer like
spearmint. For classification, I candidates are
* MNIST
* CIFAR-10

and for regression:
* Bosting housing precises

@Andy, @Kyle, and @Matthias: thanks for your references! I will have a
closer look at them tomorrow!

Christof
Post by Andy
One thing that might also be interesting is "Bootstrapping" (in the
compiler sense, not the statistics sense) the optimizer.
The latest Jasper Snoek paper http://arxiv.org/abs/1502.05700 they used
a hyper-parameter optimizer to optimize the parameter
of a hyper-parameter optimizer on a set of optimization tasks.
http://youtu.be/BIizqZ0mvIo
So we could try to optimize the parameters of the GP using the GP :)
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
***@gmail.com
http://cangermueller.com
Gael Varoquaux
2015-03-24 22:09:46 UTC
Permalink
Post by Christof Angermueller
Don't you think that I could also benchmark models that are not
implemented in sklearn? For instance, I could write a wrapper
DeepNet(...) with fit() and predict(), and which uses internally theano
to build a ANN? In this way, I could benchmark complex deep networks
beyond what will be possible with the new sklearn ANN module.
I am personally less interested in that. We have already a lot in
scikit-learn and more than enough to test the model selection code. The
focus should be on providing code that is readily-usable.

I am worried that such task will be very time consuming and will not move
us much closer to code that improves model selection in scikit-learn.

Gaël
Olivier Grisel
2015-03-25 00:08:26 UTC
Permalink
Christof, don't forget to put your proposal on melange by Thursday
(the earlier the better). Please put "scikit-learn" in the title to
make it easy to find.
--
Olivier
Vlad Niculae
2015-03-25 00:24:43 UTC
Permalink
Hi Cristoph, Gael, hi everyone,
Post by Gael Varoquaux
Post by Christof Angermueller
Don't you think that I could also benchmark models that are not
implemented in sklearn? […]
I am personally less interested in that. We have already a lot in
scikit-learn and more than enough to test the model selection code.
On top of this, people have already been using dedicated hyperparameter optimizer toolkits for Theano deep nets. I don’t think we should aim to compete with hyperopt/spearmint from day 0 (or ever), but, just like Gael said,
Post by Gael Varoquaux
The focus should be on providing code that is readily-usable.
As for your proposal, I have a few comments.

1. in 3.1 you say “It will have same interfaces as GridSearchCV and RandomizedSearchCV”. Even the use of plural “interfaces” here points at a problem: those two object do not have identical interfaces. Which interface will GPSearchCV have? Will it take (prior) distributions over hyperparameters? (In the same format as RandomizedSearchCV?) Ranges and assume a fixed prior? I think a more detailed discussion of the user-facing API would be useful.

2. Ideally this module would fully reuse the GP module. We should have no code redundancy, but the way your proposal is written, it does not focus much on the interaction of your changes with the GP module. (For example, will slice sampling be a contribution to the GP module?) Change sets that reach deeper will take longer to review and merge.

3. Your point in 4.4 about optimizing improvement *per second* seems desirable, where does it fit in the timeline? Will everything be done with this in mind from the start?

4. Parallelization is interesting and seems non-trivial. I’m a bit dense but I managed to understand Gael’s seeding suggestion earlier. The paragraph in your proposal confused me though, especially the part “I will use all completed, and integrate over pending evaluations to approximate the expected acquisition function.” Could you clarify?

5. (Timeline stuff.) I’m not sure what the relationship between “Build optimizing pipeline from parts implemented so far” and “First working prototype” is. Testing features shouldn’t come so late, it should be done at the same time. In general, the timeline would benefit from a slight shift of perspective: when would you like to have the PR on X functionally complete (this includes tests)? Overall complete (includes docs, examples and at least some review)?

Hope my comments are helpful,

Yours,
Vlad
Post by Gael Varoquaux
I am worried that such task will be very time consuming and will not move
us much closer to code that improves model selection in scikit-learn.
Gaël
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Kyle Kastner
2015-03-25 00:53:16 UTC
Permalink
I would focus on the API of this functionality and how/what users will
be allowed to specify. To me, this is a particularly tricky bit of the
PR. As Vlad said, take a close look at GridSearchCV and
RandomizedSearchCV and see how they interact with the codebase. Do you
plan to find good defaults for existing estimators? Or use simple
ones? Even setting simple hyperparameter ranges for estimators will
take some work. Is there a way to do this automagically?

Slice sampling and parallelization - is it necessary to have these so
early in the timeline? I would move benchmarking, profiling, and
documentation up. Those things tend to take more time than expected,
and good documentation will be key for this work. Parallelization and
slice sampling are both useful, but are pretty much internal facing -
and I would expect you would need benchmark code to prove that
parallelization and slice sampling are useful. The docs you write
should 99% apply to the code before and after adding parallelization
and slice sampling.

I think it is also key that you take a look at the new GP interface.
The PR code is fairly mature but being *very* familiar with how it
works will be a key part of success in this task.

It is good to think about external compatibility (I am especially
interested in this for selfish reasons), but it is most important to
get something that works well for sklearn alone. I don't think testing
on deep networks is especially useful for sklearn, especially since
spearmint, hyperopt, whetlab, and many other packages all try to do
this. IMO, random forests or GBRT are great candidates for examples.
Focusing on a simple, well thought out CV object with *great*
documentation and examples is most important, and will have the
largest benefit for users.

Overall, like Vlad said, the more you can break this into smaller
changes the better it is. I am not really sure how to do this, beyond
one PR with base GPSearchCV and associated code, then optimizations
like parallelization/slice sampling/EIperS in smaller following PRs,
but it is very important to think about.
Post by Vlad Niculae
Hi Cristoph, Gael, hi everyone,
Post by Gael Varoquaux
Post by Christof Angermueller
Don't you think that I could also benchmark models that are not
implemented in sklearn? […]
I am personally less interested in that. We have already a lot in
scikit-learn and more than enough to test the model selection code.
On top of this, people have already been using dedicated hyperparameter optimizer toolkits for Theano deep nets. I don’t think we should aim to compete with hyperopt/spearmint from day 0 (or ever), but, just like Gael said,
Post by Gael Varoquaux
The focus should be on providing code that is readily-usable.
As for your proposal, I have a few comments.
1. in 3.1 you say “It will have same interfaces as GridSearchCV and RandomizedSearchCV”. Even the use of plural “interfaces” here points at a problem: those two object do not have identical interfaces. Which interface will GPSearchCV have? Will it take (prior) distributions over hyperparameters? (In the same format as RandomizedSearchCV?) Ranges and assume a fixed prior? I think a more detailed discussion of the user-facing API would be useful.
2. Ideally this module would fully reuse the GP module. We should have no code redundancy, but the way your proposal is written, it does not focus much on the interaction of your changes with the GP module. (For example, will slice sampling be a contribution to the GP module?) Change sets that reach deeper will take longer to review and merge.
3. Your point in 4.4 about optimizing improvement *per second* seems desirable, where does it fit in the timeline? Will everything be done with this in mind from the start?
4. Parallelization is interesting and seems non-trivial. I’m a bit dense but I managed to understand Gael’s seeding suggestion earlier. The paragraph in your proposal confused me though, especially the part “I will use all completed, and integrate over pending evaluations to approximate the expected acquisition function.” Could you clarify?
5. (Timeline stuff.) I’m not sure what the relationship between “Build optimizing pipeline from parts implemented so far” and “First working prototype” is. Testing features shouldn’t come so late, it should be done at the same time. In general, the timeline would benefit from a slight shift of perspective: when would you like to have the PR on X functionally complete (this includes tests)? Overall complete (includes docs, examples and at least some review)?
Hope my comments are helpful,
Yours,
Vlad
Post by Gael Varoquaux
I am worried that such task will be very time consuming and will not move
us much closer to code that improves model selection in scikit-learn.
Gaël
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2015-03-25 18:47:28 UTC
Permalink
I think you could bench on other problems, but maybe focus on the ones
in scikit-learn.
Deep learning people might be happy with using external tools for
optimizing.
I'd also recommend benchmarking just the global optimization part on
global optimization datasets as they were used in Jasper's work.
Post by Christof Angermueller
Don't you think that I could also benchmark models that are not
implemented in sklearn? For instance, I could write a wrapper
DeepNet(...) with fit() and predict(), and which uses internally theano
to build a ANN? In this way, I could benchmark complex deep networks
beyond what will be possible with the new sklearn ANN module. This might
be interesting for the deep learning community.
* RandomForestClassifier
* SVC
* GaussianProcess
* Perceptron
As benchmark data sets, I would use those that were used before (see
Snoek at al 2012, Bergstra et at 2011) to evaluate optimizer like
spearmint. For classification, I candidates are
* MNIST
* CIFAR-10
* Bosting housing precises
@Andy, @Kyle, and @Matthias: thanks for your references! I will have a
closer look at them tomorrow!
Christof
Post by Andy
One thing that might also be interesting is "Bootstrapping" (in the
compiler sense, not the statistics sense) the optimizer.
The latest Jasper Snoek paper http://arxiv.org/abs/1502.05700 they used
a hyper-parameter optimizer to optimize the parameter
of a hyper-parameter optimizer on a set of optimization tasks.
http://youtu.be/BIizqZ0mvIo
So we could try to optimize the parameters of the GP using the GP :)
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Gael Varoquaux
2015-03-25 19:15:39 UTC
Permalink
I am very afraid of the time sink that this will be.

Sent from my phone. Please forgive brevity and mis spelling
Post by Andreas Mueller
I think you could bench on other problems, but maybe focus on the ones
in scikit-learn.
Deep learning people might be happy with using external tools for
optimizing.
I'd also recommend benchmarking just the global optimization part on
global optimization datasets as they were used in Jasper's work.
Post by Christof Angermueller
Don't you think that I could also benchmark models that are not
implemented in sklearn? For instance, I could write a wrapper
DeepNet(...) with fit() and predict(), and which uses internally
theano
Post by Christof Angermueller
to build a ANN? In this way, I could benchmark complex deep networks
beyond what will be possible with the new sklearn ANN module. This
might
Post by Christof Angermueller
be interesting for the deep learning community.
* RandomForestClassifier
* SVC
* GaussianProcess
* Perceptron
As benchmark data sets, I would use those that were used before (see
Snoek at al 2012, Bergstra et at 2011) to evaluate optimizer like
spearmint. For classification, I candidates are
* MNIST
* CIFAR-10
* Bosting housing precises
@Andy, @Kyle, and @Matthias: thanks for your references! I will have
a
Post by Christof Angermueller
closer look at them tomorrow!
Christof
Post by Andy
One thing that might also be interesting is "Bootstrapping" (in the
compiler sense, not the statistics sense) the optimizer.
The latest Jasper Snoek paper http://arxiv.org/abs/1502.05700 they
used
Post by Christof Angermueller
Post by Andy
a hyper-parameter optimizer to optimize the parameter
of a hyper-parameter optimizer on a set of optimization tasks.
http://youtu.be/BIizqZ0mvIo
So we could try to optimize the parameters of the GP using the GP :)
------------------------------------------------------------------------------
Post by Christof Angermueller
Post by Andy
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Christof Angermueller
Post by Andy
by Intel and developed in partnership with Slashdot Media, is your
hub for all
Post by Christof Angermueller
Post by Andy
things parallel software development, from weekly thought leadership
blogs to
Post by Christof Angermueller
Post by Andy
news, videos, case studies, tutorials and more. Take a look and join
the
Post by Christof Angermueller
Post by Andy
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2015-03-25 19:42:43 UTC
Permalink
Testing on the global optimization problems directly will actually be a
time saver,
as they can be evaluated directly, without needing to compute an
estimator on MNIST for each point.
Post by Gael Varoquaux
I am very afraid of the time sink that this will be.
Sent from my phone. Please forgive brevity and mis spelling
I think you could bench on other problems, but maybe focus on the ones
in scikit-learn.
Deep learning people might be happy with using external tools for
optimizing.
I'd also recommend benchmarking just the global optimization part on
global optimization datasets as they were used in Jasper's work.
Don't you think that I could also benchmark models that are
not implemented in sklearn? For instance, I could write a
wrapper DeepNet(...) with fit() and predict(), and which uses
internally theano to build a ANN? In this way, I could
benchmark complex deep networks beyond what will be possible
with the new sklearn ANN module. This might be interesting for
the deep learning community. Obvious sklearn modules to
benchmark are: * RandomForestClassifier * SVC *
GaussianProcess * Perceptron As benchmark data sets, I would
use those that were used before (see Snoek at al 2012,
Bergstra et at 2011) to evaluate optimizer like spearmint. For
classification, I candidates are * MNIST * CIFAR-10 and for
@Matthias: thanks for your references! I will have a closer
One thing that might also be interesting is
"Bootstrapping" (in the compiler sense, not the statistics
sense) the optimizer. The latest Jasper Snoek paper
http://arxiv.org/abs/1502.05700 they used a
hyper-parameter optimizer to optimize the parameter of a
hyper-parameter optimiz! er on a set of optimization
tasks. http://youtu.be/BIizqZ0mvIo So we
could try to optimize the parameters of the GP using the
GP :)
------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go
Parallel Website, sponsored by Intel and developed in
partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials
and more. Take a look and join the conversation now.
http://goparallel.sourceforge.net/
------------------------------------------------------------------------
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------
Dive into the World of Parallel Programming T!
he Go
Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
------------------------------------------------------------------------
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Christof Angermueller
2015-03-25 20:50:00 UTC
Permalink
I decided to only benchmark scikit-learn models. Doing this properly and
summarizing the results in a user-friendly rst document will take some
time and should be sufficient for a GSoC project. More sophistacted
benchmarks could be carried out afterwards.

I plan to benchmark the following models:
* RandomForestClassifer
* SVC
* the the MLP

Or is there another model which I should include?

I will use some of the datasets described in the spearmint publication,
including
* MNIST,
* CIFAR-10, and
* Bosting housing prices.

Christof





I decided to only benchmark scikit-learn models.
Post by Andreas Mueller
Testing on the global optimization problems directly will actually be
a time saver,
as they can be evaluated directly, without needing to compute an
estimator on MNIST for each point.
Post by Gael Varoquaux
I am very afraid of the time sink that this will be.
Sent from my phone. Please forgive brevity and mis spelling
I think you could bench on other problems, but maybe focus on the ones
in scikit-learn.
Deep learning people might be happy with using external tools for
optimizing.
I'd also recommend benchmarking just the global optimization part on
global optimization datasets as they were used in Jasper's work.
Don't you think that I could also benchmark models that are
not implemented in sklearn? For instance, I could write a
wrapper DeepNet(...) with fit() and predict(), and which uses
internally theano to build a ANN? In this way, I could
benchmark complex deep networks beyond what will be possible
with the new sklearn ANN module. This might be interesting
for the deep learning community. Obvious sklearn modules to
benchmark are: * RandomForestClassifier * SVC *
GaussianProcess * Perceptron As benchmark data sets, I would
use those that were used before (see Snoek at al 2012,
Bergstra et at 2011) to evaluate optimizer like spearmint.
For classification, I candidates are * MNIST * CIFAR-10 and
@Matthias: thanks for your references! I will have a closer
One thing that might also be interesting is
"Bootstrapping" (in the compiler sense, not the
statistics sense) the optimizer. The latest Jasper Snoek
paper http://arxiv.org/abs/1502.05700 they used a
hyper-parameter optimizer to optimize the parameter of a
hyper-parameter optimiz! er on a set of optimization
tasks. http://youtu.be/BIizqZ0mvIo So we
could try to optimize the parameters of the GP using the
GP :)
------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go
Parallel Website, sponsored by Intel and developed in
partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought
leadership blogs to news, videos, case studies, tutorials
and more. Take a look and join the conversation now.
http://goparallel.sourceforge.net/
------------------------------------------------------------------------
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------
Dive into the World of Parallel Programming T!
he Go
Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
------------------------------------------------------------------------
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
***@gmail.com
http://cangermueller.com
Andreas Mueller
2015-03-25 20:53:33 UTC
Permalink
As I said, I think at least for developing purposes I think it might
help you to also compare on the global optimization problems that
Jasper is reporting on in the deep neural net paper. That is probably
not for the docs, though.
I think the list below is good. Having some pipelines might also be
interesting, say using text feature extraction, but these often have
discrete choices, and I don't think the GP would work on them that well.
Post by Christof Angermueller
I decided to only benchmark scikit-learn models. Doing this properly
and summarizing the results in a user-friendly rst document will take
some time and should be sufficient for a GSoC project. More
sophistacted benchmarks could be carried out afterwards.
* RandomForestClassifer
* SVC
* the the MLP
Or is there another model which I should include?
I will use some of the datasets described in the spearmint
publication, including
* MNIST,
* CIFAR-10, and
* Bosting housing prices.
Christof
I decided to only benchmark scikit-learn models.
Post by Andreas Mueller
Testing on the global optimization problems directly will actually be
a time saver,
as they can be evaluated directly, without needing to compute an
estimator on MNIST for each point.
Post by Gael Varoquaux
I am very afraid of the time sink that this will be.
Sent from my phone. Please forgive brevity and mis spelling
I think you could bench on other problems, but maybe focus on the ones
in scikit-learn.
Deep learning people might be happy with using external tools for
optimizing.
I'd also recommend benchmarking just the global optimization part on
global optimization datasets as they were used in Jasper's work.
Don't you think that I could also benchmark models that are
not implemented in sklearn? For instance, I could write a
wrapper DeepNet(...) with fit() and predict(), and which
uses internally theano to build a ANN? In this way, I could
benchmark complex deep networks beyond what will be possible
with the new sklearn ANN module. This might be interesting
for the deep learning community. Obvious sklearn modules to
benchmark are: * RandomForestClassifier * SVC *
GaussianProcess * Perceptron As benchmark data sets, I would
use those that were used before (see Snoek at al 2012,
Bergstra et at 2011) to evaluate optimizer like spearmint.
For classification, I candidates are * MNIST * CIFAR-10 and
@Matthias: thanks for your references! I will have a closer
One thing that might also be interesting is
"Bootstrapping" (in the compiler sense, not the
statistics sense) the optimizer. The latest Jasper Snoek
paper http://arxiv.org/abs/1502.05700 they used a
hyper-parameter optimizer to optimize the parameter of a
hyper-parameter optimiz! er on a set of optimization
tasks. http://youtu.be/BIizqZ0mvIo So we
could try to optimize the parameters of the GP using the
GP :)
------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go
Parallel Website, sponsored by Intel and developed in
partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly
thought leadership blogs to news, videos, case studies,
tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
------------------------------------------------------------------------
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------
Dive into the World of Parallel Programming T!
he Go
Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
------------------------------------------------------------------------
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Christof Angermueller
2015-03-24 22:07:29 UTC
Permalink
Post by Andy
One thing that might also be interesting is "Bootstrapping" (in the
compiler sense, not the statistics sense) the optimizer.
The latest Jasper Snoek paper http://arxiv.org/abs/1502.05700 they used
a hyper-parameter optimizer to optimize the parameter
of a hyper-parameter optimizer on a set of optimization tasks.
http://youtu.be/BIizqZ0mvIo
So we could try to optimize the parameters of the GP using the GP :)
That's why I mentioned GP in my email before ;-)
Post by Andy
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
***@gmail.com
http://cangermueller.com
Christof Angermueller
2015-03-25 20:22:02 UTC
Permalink
To which SMAC paper are you referring to?
What do you mean about optimizing runtime/training time? The optimizer
should find good parameters with in a short time. Do you mean comparing
the best result in a predefined time frame? For this, the 'expected
improvement per second' acquisition function, which is mentioned in my
proposal, might achieve good results.

Christof
Post by Kyle Kastner
It might be nice to talk about optimizing runtime and/or training time
like SMAC did in their paper. I don't see any reason we couldn't do
this in sklearn, and it might be of value to users since we don't
really do deep learning as Andy said.
Post by Andy
Post by Christof Angermueller
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing.
I summary,
* I will not mentioned parallelization as an extended features,
* suggest concrete data sets for benchmarking,
* mentioned tasks for which I expect an improvement.
It is also important to have algorithms for which we expect improvements.
I'm not sure how much we want to focus on deep learning, as the MLP is
not merged.
Post by Christof Angermueller
Any further ideas?
Where can I find the PR for gaussian_processes? I would like to know
about what will be implemented and to which extend I can contribute.
As much as you want ;)
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
***@gmail.com
http://cangermueller.com
Kyle Kastner
2015-03-25 20:45:31 UTC
Permalink
See figure 5 of this paper:
http://www.cs.ubc.ca/~hutter/papers/ICML14-HyperparameterAssessment.pdf
for an example.

There is a better paper that exclusively tackles this but I cannot
find it at the moment.

I was referring to the optimizer preferring algorithms which are both
fast and give good performance - EI per S tackles this, and was what I
was referring to in my earlier email, though Hutter et. al. may have
had an alternate metric.

On Wed, Mar 25, 2015 at 4:22 PM, Christof Angermueller
Post by Christof Angermueller
To which SMAC paper are you referring to?
What do you mean about optimizing runtime/training time? The optimizer
should find good parameters with in a short time. Do you mean comparing
the best result in a predefined time frame? For this, the 'expected
improvement per second' acquisition function, which is mentioned in my
proposal, might achieve good results.
Christof
Post by Kyle Kastner
It might be nice to talk about optimizing runtime and/or training time
like SMAC did in their paper. I don't see any reason we couldn't do
this in sklearn, and it might be of value to users since we don't
really do deep learning as Andy said.
Post by Andy
Post by Christof Angermueller
https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing.
I summary,
* I will not mentioned parallelization as an extended features,
* suggest concrete data sets for benchmarking,
* mentioned tasks for which I expect an improvement.
It is also important to have algorithms for which we expect improvements.
I'm not sure how much we want to focus on deep learning, as the MLP is
not merged.
Post by Christof Angermueller
Any further ideas?
Where can I find the PR for gaussian_processes? I would like to know
about what will be implemented and to which extend I can contribute.
As much as you want ;)
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andy
2015-03-09 13:13:02 UTC
Permalink
Hi Christof.

I think implementing either the GP or SMAC approach would be good.
I talked to Jasper Snoek on Friday, possiblity the trickiest part for
the GP is the optimization of the resulting function.
Spearmint also marginalizes out the hyperparameters, which our upcoming
GP implementation doesn't support afaik.
I haven't looked into SMAC too deeply yet, but the main issue there is

The idea behind this project is as Kyle says to have something that is
easily accessible and integrates with scikit-learn,
as a replacement for GridSearchCV or RandomizedSearchCV. Btw, "old"
Spearmint is GPL,
"new" spearmint is under a non-commercial license.


Best,
Andy
Post by Christof Angermueller
Hi Andreas (and others),
I am a PhD student in Bioinformatics at the University of Cambridge,
(EBI/EMBL), supervised by Oliver Stegle and Zoubin Ghahramani. In my
PhD, I apply and develop different machine learning algorithms for
analyzing biological data.
There are different approaches for hyperparameter optimization, some
* Sequential Model-Based Global Optimization (SMBO) ->
http://www.cs.ubc.ca/labs/beta/Projects/SMAC/
* Gaussian Processes (GP) -> Spearmint;
https://github.com/JasperSnoek/spearmint
http://hyperopt.github.io/hyperopt/
* Scalable Bayesian Optimization Using Deep Neural Networks Deep
Networks for Global Optimization (DNGO) -> http://arxiv.org/abs/1502.05700
The idea is to implement ONE of this approaches, right?
Do you prefer a particular approach due to theoretical or practical reasons?
Spearmint also supports distributing jobs on a cluster (SGE). I
imagine that this requires platform specific code, which could be
difficult to maintain. What do you think?
Spearmint and hyperopt are already established python packages.
Another sklearn implementation might be considered as redundant, are
hard to establish. Do you have a particular new feature in mind?
Cheers,
Christof
--
Christof Angermueller
http://cangermueller.com
Matthias Feurer
2015-03-26 15:17:21 UTC
Permalink
Dear Christof, dear scikit-learn team,

This is a great idea, I highly encourage your idea to integrate Bayesian
Optimization into scikit-learn since automatically configuring
scikit-learn is quite powerful. It was done by the three winning teams
of the first automated machine learning competition:
https://sites.google.com/a/chalearn.org/automl/

I am writing this e-mail because our research group on learning,
optimization and automated algorithm design
(http://aad.informatik.uni-freiburg.de/) is working on very similar
things which might be useful in this context. Some people in our lab
(together with some people from other universities)developed a framework
for robust Bayesian optimization with minimal external dependencies. It
currently depends on GPy, but this dependency could be easily replaced
by the scikit-learn GP. It is probably not as leightweight as you want
to have it for scikit-learn, but you might want to have a look at the
source code. I will provide a link as soon as the project is public
(which is soon). In the meantime, I can grant read-access to those who
are interested. It might be helpful for you to have look at the
structure of the module.

Besides these remarks, I think that using a GP is a good way to tune the
few hyperparameters of a single model. Another remark: Instead of
comparing GPSearchCV to spearmint only, you should also consider the TPE
algorithm implemented in hyperopt
(https://github.com/hyperopt/hyperopt). You could consider the following
benchmarks:

1. Together with a fellow student I implemented a library called HPOlib,
which provides a few benchmarks for hyperparameter optimization (for
example some from the 2012 spearmint paper):
https://github.com/automl/HPOlib It is further described in this paper:
http://automl.org/papers/13-BayesOpt_EmpiricalFoundation.pdf
2. If you are looking for a small pipeline, you can use
sklearn.feature_selection.SelectPercentile with a fixed scoring function
together with a classification algorithm. It adds a single
hyperparameter which should be a good fit for the GP.

Best regards,
Matthias
Andreas Mueller
2015-03-26 16:08:32 UTC
Permalink
Hi Matthias.
As far as I know, the main goal for TPE was to support tree-structured
parameter spaces. I am not sure we want to go there yet because of the
more complex API.
On non-tree structured spaces, I think TPE performed worse than SMAC and GP.

With regard to your code: There might be touchy legal issues involved if
you didn't publish your code and we base our implementation on it.
If your code is public and BSD / MIT licensed, it would probably be much
safer. Why don't you just push your code under a permissive license?

Thank you for providing your benchmarks, they might be quite helpful.

Cheers,
Andy
Post by Matthias Feurer
Dear Christof, dear scikit-learn team,
This is a great idea, I highly encourage your idea to integrate
Bayesian Optimization into scikit-learn since automatically
configuring scikit-learn is quite powerful. It was done by the three
https://sites.google.com/a/chalearn.org/automl/
I am writing this e-mail because our research group on learning,
optimization and automated algorithm design
(http://aad.informatik.uni-freiburg.de/) is working on very similar
things which might be useful in this context. Some people in our lab
(together with some people from other universities)developed a
framework for robust Bayesian optimization with minimal external
dependencies. It currently depends on GPy, but this dependency could
be easily replaced by the scikit-learn GP. It is probably not as
leightweight as you want to have it for scikit-learn, but you might
want to have a look at the source code. I will provide a link as soon
as the project is public (which is soon). In the meantime, I can grant
read-access to those who are interested. It might be helpful for you
to have look at the structure of the module.
Besides these remarks, I think that using a GP is a good way to tune
the few hyperparameters of a single model. Another remark: Instead of
comparing GPSearchCV to spearmint only, you should also consider the
TPE algorithm implemented in hyperopt
(https://github.com/hyperopt/hyperopt). You could consider the
1. Together with a fellow student I implemented a library called
HPOlib, which provides a few benchmarks for hyperparameter
https://github.com/automl/HPOlib It is further described in this
paper: http://automl.org/papers/13-BayesOpt_EmpiricalFoundation.pdf
2. If you are looking for a small pipeline, you can use
sklearn.feature_selection.SelectPercentile with a fixed scoring
function together with a classification algorithm. It adds a single
hyperparameter which should be a good fit for the GP.
Best regards,
Matthias
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Christof Angermueller
2015-03-26 20:02:25 UTC
Permalink
Hi Andy and others,

I revised my proposal
(https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing)
and submitted it to melange. Can you have a look if any essential
(formal) things are missing?
I will submit the final version tomorrow.

Cheers,
Christof
Post by Andreas Mueller
Hi Matthias.
As far as I know, the main goal for TPE was to support tree-structured
parameter spaces. I am not sure we want to go there yet because of the
more complex API.
On non-tree structured spaces, I think TPE performed worse than SMAC and GP.
With regard to your code: There might be touchy legal issues involved
if you didn't publish your code and we base our implementation on it.
If your code is public and BSD / MIT licensed, it would probably be
much safer. Why don't you just push your code under a permissive license?
Thank you for providing your benchmarks, they might be quite helpful.
Cheers,
Andy
Post by Matthias Feurer
Dear Christof, dear scikit-learn team,
This is a great idea, I highly encourage your idea to integrate
Bayesian Optimization into scikit-learn since automatically
configuring scikit-learn is quite powerful. It was done by the three
https://sites.google.com/a/chalearn.org/automl/
I am writing this e-mail because our research group on learning,
optimization and automated algorithm design
(http://aad.informatik.uni-freiburg.de/) is working on very similar
things which might be useful in this context. Some people in our lab
(together with some people from other universities)developed a
framework for robust Bayesian optimization with minimal external
dependencies. It currently depends on GPy, but this dependency could
be easily replaced by the scikit-learn GP. It is probably not as
leightweight as you want to have it for scikit-learn, but you might
want to have a look at the source code. I will provide a link as soon
as the project is public (which is soon). In the meantime, I can
grant read-access to those who are interested. It might be helpful
for you to have look at the structure of the module.
Besides these remarks, I think that using a GP is a good way to tune
the few hyperparameters of a single model. Another remark: Instead of
comparing GPSearchCV to spearmint only, you should also consider the
TPE algorithm implemented in hyperopt
(https://github.com/hyperopt/hyperopt). You could consider the
1. Together with a fellow student I implemented a library called
HPOlib, which provides a few benchmarks for hyperparameter
https://github.com/automl/HPOlib It is further described in this
paper: http://automl.org/papers/13-BayesOpt_EmpiricalFoundation.pdf
2. If you are looking for a small pipeline, you can use
sklearn.feature_selection.SelectPercentile with a fixed scoring
function together with a classification algorithm. It adds a single
hyperparameter which should be a good fit for the GP.
Best regards,
Matthias
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
***@gmail.com
http://cangermueller.com
Christof Angermueller
2015-03-26 21:07:53 UTC
Permalink
GridSearchCV and RandomizedSearchCV inherit from BaseCV and require
and an estimator object with fit() and predict() as first constructor
argument. Hence, the estimator must follow the sklearn convention with
fit() and predict(). Instead, the estimator might also be implemented as
a black-box function f(x) that takes some arguments and returns a value,
as it is done in spearmint. This makes it easier to optimize any
algorithms, not just those implemented in sklearn.

For consistency, GPSearchCV should also inherit from BaseCV. But what to
you think about extending the current interface to make it easier to
optimize any learner?

Christof
Post by Christof Angermueller
Hi Andy and others,
I revised my proposal
(https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing)
and submitted it to melange. Can you have a look if any essential
(formal) things are missing?
I will submit the final version tomorrow.
Cheers,
Christof
Post by Andreas Mueller
Hi Matthias.
As far as I know, the main goal for TPE was to support
tree-structured parameter spaces. I am not sure we want to go there
yet because of the more complex API.
On non-tree structured spaces, I think TPE performed worse than SMAC and GP.
With regard to your code: There might be touchy legal issues involved
if you didn't publish your code and we base our implementation on it.
If your code is public and BSD / MIT licensed, it would probably be
much safer. Why don't you just push your code under a permissive license?
Thank you for providing your benchmarks, they might be quite helpful.
Cheers,
Andy
Post by Matthias Feurer
Dear Christof, dear scikit-learn team,
This is a great idea, I highly encourage your idea to integrate
Bayesian Optimization into scikit-learn since automatically
configuring scikit-learn is quite powerful. It was done by the three
https://sites.google.com/a/chalearn.org/automl/
I am writing this e-mail because our research group on learning,
optimization and automated algorithm design
(http://aad.informatik.uni-freiburg.de/) is working on very similar
things which might be useful in this context. Some people in our lab
(together with some people from other universities)developed a
framework for robust Bayesian optimization with minimal external
dependencies. It currently depends on GPy, but this dependency could
be easily replaced by the scikit-learn GP. It is probably not as
leightweight as you want to have it for scikit-learn, but you might
want to have a look at the source code. I will provide a link as
soon as the project is public (which is soon). In the meantime, I
can grant read-access to those who are interested. It might be
helpful for you to have look at the structure of the module.
Besides these remarks, I think that using a GP is a good way to tune
the few hyperparameters of a single model. Another remark: Instead
of comparing GPSearchCV to spearmint only, you should also consider
the TPE algorithm implemented in hyperopt
(https://github.com/hyperopt/hyperopt). You could consider the
1. Together with a fellow student I implemented a library called
HPOlib, which provides a few benchmarks for hyperparameter
https://github.com/automl/HPOlib It is further described in this
paper: http://automl.org/papers/13-BayesOpt_EmpiricalFoundation.pdf
2. If you are looking for a small pipeline, you can use
sklearn.feature_selection.SelectPercentile with a fixed scoring
function together with a classification algorithm. It adds a single
hyperparameter which should be a good fit for the GP.
Best regards,
Matthias
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
http://cangermueller.com
--
Christof Angermueller
***@gmail.com
http://cangermueller.com
Andreas Mueller
2015-03-26 21:12:45 UTC
Permalink
I think the class that you introduce should really be geared towards
scikit-learn estimators.
But there could be a "lower level" function that just optimizes a
black-box function.
That is probably desirable from a modularity standpoint and for testing
anyhow.
Post by Christof Angermueller
GridSearchCV and RandomizedSearchCV inherit from BaseCV and require
and an estimator object with fit() and predict() as first constructor
argument. Hence, the estimator must follow the sklearn convention with
fit() and predict(). Instead, the estimator might also be implemented
as a black-box function f(x) that takes some arguments and returns a
value, as it is done in spearmint. This makes it easier to optimize
any algorithms, not just those implemented in sklearn.
For consistency, GPSearchCV should also inherit from BaseCV. But what
to you think about extending the current interface to make it easier
to optimize any learner?
Christof
Post by Christof Angermueller
Hi Andy and others,
I revised my proposal
(https://docs.google.com/document/d/1bAWdiu6hZ6-FhSOlhgH-7x3weTluxRfouw9op9bHBxs/edit?usp=sharing)
and submitted it to melange. Can you have a look if any essential
(formal) things are missing?
I will submit the final version tomorrow.
Cheers,
Christof
Post by Andreas Mueller
Hi Matthias.
As far as I know, the main goal for TPE was to support
tree-structured parameter spaces. I am not sure we want to go there
yet because of the more complex API.
On non-tree structured spaces, I think TPE performed worse than SMAC and GP.
With regard to your code: There might be touchy legal issues
involved if you didn't publish your code and we base our
implementation on it.
If your code is public and BSD / MIT licensed, it would probably be
much safer. Why don't you just push your code under a permissive license?
Thank you for providing your benchmarks, they might be quite helpful.
Cheers,
Andy
Post by Matthias Feurer
Dear Christof, dear scikit-learn team,
This is a great idea, I highly encourage your idea to integrate
Bayesian Optimization into scikit-learn since automatically
configuring scikit-learn is quite powerful. It was done by the
three winning teams of the first automated machine learning
competition: https://sites.google.com/a/chalearn.org/automl/
I am writing this e-mail because our research group on learning,
optimization and automated algorithm design
(http://aad.informatik.uni-freiburg.de/) is working on very similar
things which might be useful in this context. Some people in our
lab (together with some people from other universities)developed a
framework for robust Bayesian optimization with minimal external
dependencies. It currently depends on GPy, but this dependency
could be easily replaced by the scikit-learn GP. It is probably not
as leightweight as you want to have it for scikit-learn, but you
might want to have a look at the source code. I will provide a link
as soon as the project is public (which is soon). In the meantime,
I can grant read-access to those who are interested. It might be
helpful for you to have look at the structure of the module.
Besides these remarks, I think that using a GP is a good way to
Instead of comparing GPSearchCV to spearmint only, you should also
consider the TPE algorithm implemented in hyperopt
(https://github.com/hyperopt/hyperopt). You could consider the
1. Together with a fellow student I implemented a library called
HPOlib, which provides a few benchmarks for hyperparameter
https://github.com/automl/HPOlib It is further described in this
paper: http://automl.org/papers/13-BayesOpt_EmpiricalFoundation.pdf
2. If you are looking for a small pipeline, you can use
sklearn.feature_selection.SelectPercentile with a fixed scoring
function together with a classification algorithm. It adds a single
hyperparameter which should be a good fit for the GP.
Best regards,
Matthias
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
http://cangermueller.com
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Christof Angermueller
2015-03-26 22:19:43 UTC
Permalink
Hi Matthias,

using HPOlib to benchmark GPSearchCV on the same datasets that were used
to benchmark spearmint, TPA, and SMAC, is a good idea, and I will
include it in my proposal. However, I plan to primarily compare
GPSearchCV with GridSearchCV, RandomizedSearchCV, as well as spearmint
as only external optimizer. Including TPA and SMAC is optional, which I
could do after GSoC or in the unlikely case that time is left at the end.

At the current stage, I can not tell you if I will use ParamSklearn to
define hyperparameters. Maybe I will come back to you when I think for
carefully about how to define parameters.

Thanks for you suggestions,
Christof
Post by Andreas Mueller
Hi Matthias.
As far as I know, the main goal for TPE was to support tree-structured
parameter spaces. I am not sure we want to go there yet because of the
more complex API.
On non-tree structured spaces, I think TPE performed worse than SMAC and GP.
With regard to your code: There might be touchy legal issues involved
if you didn't publish your code and we base our implementation on it.
If your code is public and BSD / MIT licensed, it would probably be
much safer. Why don't you just push your code under a permissive license?
Thank you for providing your benchmarks, they might be quite helpful.
Cheers,
Andy
Post by Matthias Feurer
Dear Christof, dear scikit-learn team,
This is a great idea, I highly encourage your idea to integrate
Bayesian Optimization into scikit-learn since automatically
configuring scikit-learn is quite powerful. It was done by the three
https://sites.google.com/a/chalearn.org/automl/
I am writing this e-mail because our research group on learning,
optimization and automated algorithm design
(http://aad.informatik.uni-freiburg.de/) is working on very similar
things which might be useful in this context. Some people in our lab
(together with some people from other universities)developed a
framework for robust Bayesian optimization with minimal external
dependencies. It currently depends on GPy, but this dependency could
be easily replaced by the scikit-learn GP. It is probably not as
leightweight as you want to have it for scikit-learn, but you might
want to have a look at the source code. I will provide a link as soon
as the project is public (which is soon). In the meantime, I can
grant read-access to those who are interested. It might be helpful
for you to have look at the structure of the module.
Besides these remarks, I think that using a GP is a good way to tune
the few hyperparameters of a single model. Another remark: Instead of
comparing GPSearchCV to spearmint only, you should also consider the
TPE algorithm implemented in hyperopt
(https://github.com/hyperopt/hyperopt). You could consider the
1. Together with a fellow student I implemented a library called
HPOlib, which provides a few benchmarks for hyperparameter
https://github.com/automl/HPOlib It is further described in this
paper: http://automl.org/papers/13-BayesOpt_EmpiricalFoundation.pdf
2. If you are looking for a small pipeline, you can use
sklearn.feature_selection.SelectPercentile with a fixed scoring
function together with a classification algorithm. It adds a single
hyperparameter which should be a good fit for the GP.
Best regards,
Matthias
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
***@gmail.com
http://cangermueller.com
Andy
2015-03-26 22:51:45 UTC
Permalink
I think you should focus on first creating a prototype without ParamSklearn.
Post by Christof Angermueller
Hi Matthias,
using HPOlib to benchmark GPSearchCV on the same datasets that were
used to benchmark spearmint, TPA, and SMAC, is a good idea, and I will
include it in my proposal. However, I plan to primarily compare
GPSearchCV with GridSearchCV, RandomizedSearchCV, as well as
spearmint as only external optimizer. Including TPA and SMAC is
optional, which I could do after GSoC or in the unlikely case that
time is left at the end.
At the current stage, I can not tell you if I will use ParamSklearn to
define hyperparameters. Maybe I will come back to you when I think for
carefully about how to define parameters.
Thanks for you suggestions,
Christof
Post by Andreas Mueller
Hi Matthias.
As far as I know, the main goal for TPE was to support
tree-structured parameter spaces. I am not sure we want to go there
yet because of the more complex API.
On non-tree structured spaces, I think TPE performed worse than SMAC and GP.
With regard to your code: There might be touchy legal issues involved
if you didn't publish your code and we base our implementation on it.
If your code is public and BSD / MIT licensed, it would probably be
much safer. Why don't you just push your code under a permissive license?
Thank you for providing your benchmarks, they might be quite helpful.
Cheers,
Andy
Post by Matthias Feurer
Dear Christof, dear scikit-learn team,
This is a great idea, I highly encourage your idea to integrate
Bayesian Optimization into scikit-learn since automatically
configuring scikit-learn is quite powerful. It was done by the three
https://sites.google.com/a/chalearn.org/automl/
I am writing this e-mail because our research group on learning,
optimization and automated algorithm design
(http://aad.informatik.uni-freiburg.de/) is working on very similar
things which might be useful in this context. Some people in our lab
(together with some people from other universities)developed a
framework for robust Bayesian optimization with minimal external
dependencies. It currently depends on GPy, but this dependency could
be easily replaced by the scikit-learn GP. It is probably not as
leightweight as you want to have it for scikit-learn, but you might
want to have a look at the source code. I will provide a link as
soon as the project is public (which is soon). In the meantime, I
can grant read-access to those who are interested. It might be
helpful for you to have look at the structure of the module.
Besides these remarks, I think that using a GP is a good way to tune
the few hyperparameters of a single model. Another remark: Instead
of comparing GPSearchCV to spearmint only, you should also consider
the TPE algorithm implemented in hyperopt
(https://github.com/hyperopt/hyperopt). You could consider the
1. Together with a fellow student I implemented a library called
HPOlib, which provides a few benchmarks for hyperparameter
https://github.com/automl/HPOlib It is further described in this
paper: http://automl.org/papers/13-BayesOpt_EmpiricalFoundation.pdf
2. If you are looking for a small pipeline, you can use
sklearn.feature_selection.SelectPercentile with a fixed scoring
function together with a classification algorithm. It adds a single
hyperparameter which should be a good fit for the GP.
Best regards,
Matthias
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Christof Angermueller
2015-03-27 16:56:54 UTC
Permalink
I submitted my final proposal to melange.

Thanks everybody for your suggestions!
Christof
Post by Andy
I think you should focus on first creating a prototype without
ParamSklearn.
Post by Christof Angermueller
Hi Matthias,
using HPOlib to benchmark GPSearchCV on the same datasets that were
used to benchmark spearmint, TPA, and SMAC, is a good idea, and I
will include it in my proposal. However, I plan to primarily compare
GPSearchCV with GridSearchCV, RandomizedSearchCV, as well as
spearmint as only external optimizer. Including TPA and SMAC is
optional, which I could do after GSoC or in the unlikely case that
time is left at the end.
At the current stage, I can not tell you if I will use ParamSklearn
to define hyperparameters. Maybe I will come back to you when I think
for carefully about how to define parameters.
Thanks for you suggestions,
Christof
Post by Andreas Mueller
Hi Matthias.
As far as I know, the main goal for TPE was to support
tree-structured parameter spaces. I am not sure we want to go there
yet because of the more complex API.
On non-tree structured spaces, I think TPE performed worse than SMAC and GP.
With regard to your code: There might be touchy legal issues
involved if you didn't publish your code and we base our
implementation on it.
If your code is public and BSD / MIT licensed, it would probably be
much safer. Why don't you just push your code under a permissive license?
Thank you for providing your benchmarks, they might be quite helpful.
Cheers,
Andy
Post by Matthias Feurer
Dear Christof, dear scikit-learn team,
This is a great idea, I highly encourage your idea to integrate
Bayesian Optimization into scikit-learn since automatically
configuring scikit-learn is quite powerful. It was done by the
three winning teams of the first automated machine learning
competition: https://sites.google.com/a/chalearn.org/automl/
I am writing this e-mail because our research group on learning,
optimization and automated algorithm design
(http://aad.informatik.uni-freiburg.de/) is working on very similar
things which might be useful in this context. Some people in our
lab (together with some people from other universities)developed a
framework for robust Bayesian optimization with minimal external
dependencies. It currently depends on GPy, but this dependency
could be easily replaced by the scikit-learn GP. It is probably not
as leightweight as you want to have it for scikit-learn, but you
might want to have a look at the source code. I will provide a link
as soon as the project is public (which is soon). In the meantime,
I can grant read-access to those who are interested. It might be
helpful for you to have look at the structure of the module.
Besides these remarks, I think that using a GP is a good way to
Instead of comparing GPSearchCV to spearmint only, you should also
consider the TPE algorithm implemented in hyperopt
(https://github.com/hyperopt/hyperopt). You could consider the
1. Together with a fellow student I implemented a library called
HPOlib, which provides a few benchmarks for hyperparameter
https://github.com/automl/HPOlib It is further described in this
paper: http://automl.org/papers/13-BayesOpt_EmpiricalFoundation.pdf
2. If you are looking for a small pipeline, you can use
sklearn.feature_selection.SelectPercentile with a fixed scoring
function together with a classification algorithm. It adds a single
hyperparameter which should be a good fit for the GP.
Best regards,
Matthias
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
http://cangermueller.com
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Christof Angermueller
***@gmail.com
http://cangermueller.com
Issam Laradji
2015-03-27 17:22:11 UTC
Permalink
Hi all,

Is it possible to return the number of comparisons made by, say, ball
trees (a nearest neighbor object) when searching for the closest query ?

Scikit Ball trees are implemented in Cython so it might require some
work to add a counter to the ball tree code and re-compiling the file.

So I am wondering if there is a method on the nearest neighbor object
that I could call that would give me the number of comparisons.

If not, wouldn't it be interesting to have such method added in a future
release ?

This is because ball trees is a branch and bound alg. and have a time
complexity of approximately O(dlogn) but the complexity could become
O(dn) if nothing is pruned.
So it would be interesting to empirically observe how likely it is to
have that worst case complexity.

Cheers!
--Issam

Loading...