Discussion:
[Scikit-learn-general] grid search, joblib
Satrajit Ghosh
2011-11-10 14:20:19 UTC
Permalink
hi,

is there a way to implement a coarse->fine grid search with a pipeline?
essentially i want to link up two or more gridsearchcv modules where the
range of parameters for a given search is set around the best parameters
estimated by the previous gridsearchcv.

also when i use grid_search in a permutation test setting, i have to reset
the number of jobs to 1, because i set the number of permutation jobs to
-1. would this be something that could be useful to do automatically? does
joblib have any load balancing capability?

cheers,

satra
Gael Varoquaux
2011-11-10 14:40:04 UTC
Permalink
Post by Satrajit Ghosh
is there a way to implement a coarse->fine grid search with a pipeline?
essentially i want to link up two or more gridsearchcv modules where the
range of parameters for a given search is set around the best parameters
estimated by the previous gridsearchcv.
I don't really see how this would be done. I have recently used such a
strategy in GraphLassoCV, where it was necessary due to points of the
grid where the estimator does not converge.

I think a new object would be necessary. Something like
'RefinedGridSearchCV', that would either derive from GridSearchCV, or a
comon base class.

I think that it might be an interesting addition. I say 'might' because I
have given such ideas a try on general problems, and they actually often
do not work well: the score as a function of parameters is often a nasty
landscape. First it has a lot of noise. Second, it may have multiple
local maxima. Third it often has plateaus, and simplex-like optimizers
get stuch there. There is probably a fair amount of work to implement
something like this that actually work on more than a hand full of
estimators. This is the kind of project that should be a long running
experimental branch before we consider merging it.

As a side note, part of the code may be factored out from the
GraphLassoCV code, eventhough I coded specificities of the GraphLasso
problem in there, like where the plateau is, and what to do when
estimators do not converge.

Note that the new object would not implement the same contract as
GridSearchCV, as GridSearchCV is useable on parameters that are not
defined on a compact space (integers, or categorial variables like a
kernel name: {'rbf', 'linear'}.
Post by Satrajit Ghosh
also when i use grid_search in a permutation test setting, i have to reset
the number of jobs to 1, because i set the number of permutation jobs to
-1. would this be something that could be useful to do automatically?
I am a bit wary of the kind of magic that may arise here. The reason why
we cannot have nested joblib.Parallel is that daemonic process cannot
fork under unix. Basically, we could, in joblib, capture the fact that we
are a daemonic process, and set n_jobs to 1 after a warning. AFAIK, this
can be tested with the following code:

from multiprocessing import _current_process
assert not _current_process._daemonic

If you want to do a pull request in joblib implementing this, I would
take it.
Post by Satrajit Ghosh
does joblib have any load balancing capability?
No. This is quite challenging, and, unless someone finds a simple way of
implementing it, beyond the scope of joblib, I believe.

Gaël
Kenneth C. Arnold
2011-11-10 15:44:34 UTC
Permalink
On Thu, Nov 10, 2011 at 9:40 AM, Gael Varoquaux
Post by Gael Varoquaux
I think that it might be an interesting addition. I say 'might' because I
have given such ideas a try on general problems, and they actually often
do not work well: the score as a function of parameters is often a nasty
landscape. First it has a lot of noise. Second, it may have multiple
local maxima. Third it often has plateaus, and simplex-like optimizers
get stuch there. There is probably a fair amount of work to implement
something like this that actually work on more than a hand full of
estimators. This is the kind of project that should be a long running
experimental branch before we consider merging it.
Perhaps try random points? see "Random Search for Hyper-Parameter
Optimization. James Bergstra and Yoshua Bengio" (I'm hoping their full
paper will go into more depth!)

-Ken
Gael Varoquaux
2011-11-10 17:13:23 UTC
Permalink
Post by Kenneth C. Arnold
Perhaps try random points? see "Random Search for Hyper-Parameter
Optimization. James Bergstra and Yoshua Bengio" (I'm hoping their full
paper will go into more depth!)
I was talking to James about this paper a month ago, and he said that it
would be useful only for many parameters to optimize, say 8 or 10.

G
Mathieu Blondel
2011-11-10 19:13:23 UTC
Permalink
Somewhat related to the coarse to fine search, there would also be the
idea of binary search. As Gael said, there may be many local optima
but if you're optimizing only one parameter, there may be times in
which a binary search can make sense.

Another idea for speeding up grid search is to optimize a parameter
while keeping all the others fixed. We would then need to let the user
define the order in which the parameters are optimized.

Mathieu
Gael Varoquaux
2011-11-10 21:04:31 UTC
Permalink
Post by Mathieu Blondel
Somewhat related to the coarse to fine search, there would also be the
idea of binary search. As Gael said, there may be many local optima
but if you're optimizing only one parameter, there may be times in
which a binary search can make sense.
This is the general setting of simplex optimizers, in particular the
Nelder-Mead method, which is reasonnably robust to noise if the
underlying function is bell-shaped. In 1D, it gives the so-called
dichotomy method.

I gave it a try a while ago, in our private codebase, implementing what I
called a 'DichotomyCV' estimator. To make this somewhat work, I had to
modify a bit the nelder-mead from scipy to have more control on the
behavior with noise.

I wanted to test it on real applications, and contribute it to the scikit
in the long run. I gave up: on real applications it kept failing, and I
realized that I didn't trust it, and would run a grid search if I didn't
get good scores.

The number one challenge was to deal with plateaus with additional noise.
Simplex methods get stuck there.

The code is available if people want it, but it will need significant
rework to be usable on real problems, IMHO.

Gaël

Continue reading on narkive:
Loading...