Discussion:
GridSearch
(too old to reply)
Mathias Verbeke
2012-02-03 09:54:36 UTC
Permalink
Hi all,

I'm currently looking at the GridSearch example (
http://scikit-learn.org/0.9/auto_examples/grid_search_digits.html), and I
don't completely get the point of using cross-validation twice. Why aren't
the parameters and the classifier selected in on cross-validations step?

Furthermore, I was wondering if I do a refit at the end of the GridSearch
procedure, it will train the model on the complete dataset, so that it can
be applied on the test set afterwards?

Best and thanks,

Mathias
Andreas
2012-02-03 10:03:51 UTC
Permalink
Hi Mathias.
First, please note that you are looking at an "old" version of the docs.
We are in the process to include a warning.
Please refer to
http://scikit-learn.org/stable/auto_examples/grid_search_digits.html
<http://scikit-learn.org/0.9/auto_examples/grid_search_digits.html> instead.

For your first question:
I didn't write the example but this is how I understood it:

Usually when evaluating a machine learning method, you are given a
"test" and a "training"
set and train on the training set and test on the test set.
If you want to adjust hyper parameters of the method, a common way is to
do cross-validation
on the training set.
Then you still need to evaluate on an independent test set, to see how
well your parameters
generalize to something unseen.
As the digits data set does not come in a training and test part, the
StratifiedKFold split
is used to to simulate this.
Does this answer your question?


For your second question:
There is a parameter "refit" of the GridSearchCV (see the references
<http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html#sklearn.grid_search.GridSearchCV>
) that decides exactly that.
It is "True" by default.

Cheers,
Andy
Post by Mathias Verbeke
Hi all,
I'm currently looking at the GridSearch example
(http://scikit-learn.org/0.9/auto_examples/grid_search_digits.html),
and I don't completely get the point of using cross-validation twice.
Why aren't the parameters and the classifier selected in on
cross-validations step?
Furthermore, I was wondering if I do a refit at the end of the
GridSearch procedure, it will train the model on the complete dataset,
so that it can be applied on the test set afterwards?
Best and thanks,
Mathias
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Mathias Verbeke
2012-02-03 11:06:31 UTC
Permalink
Hi Adreas,

Thanks a lot; that answers my questions. Just a quick check to be sure I
understand it correctly: the results in the classification report for the
best classifier are the ones on the test set, right?

And another small question: could you tell me how/where I need to set the
class_weight parameter, since this doesn't seem to work in the regular way
in the fit method? Would it furthermore be possible to - besides 'auto' -
tune this as well with GridSearch?

Thanks,

Mathias
Post by Andreas
**
Hi Mathias.
First, please note that you are looking at an "old" version of the docs.
We are in the process to include a warning.
Please refer to
http://scikit-learn.org/stable/auto_examples/grid_search_digits.html<http://scikit-learn.org/0.9/auto_examples/grid_search_digits.html>instead.
Usually when evaluating a machine learning method, you are given a "test"
and a "training"
set and train on the training set and test on the test set.
If you want to adjust hyper parameters of the method, a common way is to
do cross-validation
on the training set.
Then you still need to evaluate on an independent test set, to see how
well your parameters
generalize to something unseen.
As the digits data set does not come in a training and test part, the
StratifiedKFold split
is used to to simulate this.
Does this answer your question?
There is a parameter "refit" of the GridSearchCV (see the references<http://scikit-learn.org/stable/modules/generated/sklearn.grid_search.GridSearchCV.html#sklearn.grid_search.GridSearchCV>) that decides exactly that.
It is "True" by default.
Cheers,
Andy
Hi all,
I'm currently looking at the GridSearch example (
http://scikit-learn.org/0.9/auto_examples/grid_search_digits.html), and I
don't completely get the point of using cross-validation twice. Why aren't
the parameters and the classifier selected in on cross-validations step?
Furthermore, I was wondering if I do a refit at the end of the GridSearch
procedure, it will train the model on the complete dataset, so that it can
be applied on the test set afterwards?
Best and thanks,
Mathias
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Olivier Grisel
2012-02-03 11:19:21 UTC
Permalink
Post by Mathias Verbeke
Hi Adreas,
Thanks a lot; that answers my questions. Just a quick check to be sure I
understand it correctly: the results in the classification report for the
best classifier are the ones on the test set, right?
It print the performance measured on the test set (also known as
evaluation set) of the best classifier as found on the training set
(also known as development set).

If you do the parameter selection and evaluation on the same dataset
you will be likely to overfit the hyperparameters settings and hence
your performance estimation will be an over-estimate of the true
generalization performance.
Post by Mathias Verbeke
And another small question: could you tell me how/where I need to set the
class_weight parameter, since this doesn't seem to work in the regular way
in the fit method? Would it furthermore be possible to - besides 'auto' -
tune this as well with GridSearch?
You can extend the grid search as follows (that will double the running time):

tuned_parameters = [
{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
'C': [1, 10, 100, 1000], 'class_weight': [None, 'auto']},
{'kernel': ['linear'], 'C': [1, 10, 100, 1000],
'class_weight': [None, 'auto']}
]
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Mathias Verbeke
2012-02-03 12:23:12 UTC
Permalink
Hi Olivier,

That's something I tried already, but then I get:

AssertionError: Invalid parameter class_weight for estimator SVC

Any idea what can be wrong?

Thanks,

Mathias
Post by Olivier Grisel
Post by Mathias Verbeke
Hi Adreas,
Thanks a lot; that answers my questions. Just a quick check to be sure I
understand it correctly: the results in the classification report for the
best classifier are the ones on the test set, right?
It print the performance measured on the test set (also known as
evaluation set) of the best classifier as found on the training set
(also known as development set).
If you do the parameter selection and evaluation on the same dataset
you will be likely to overfit the hyperparameters settings and hence
your performance estimation will be an over-estimate of the true
generalization performance.
Post by Mathias Verbeke
And another small question: could you tell me how/where I need to set the
class_weight parameter, since this doesn't seem to work in the regular
way
Post by Mathias Verbeke
in the fit method? Would it furthermore be possible to - besides 'auto' -
tune this as well with GridSearch?
tuned_parameters = [
{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
'C': [1, 10, 100, 1000], 'class_weight': [None, 'auto']},
{'kernel': ['linear'], 'C': [1, 10, 100, 1000],
'class_weight': [None, 'auto']}
]
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas
2012-02-03 12:30:15 UTC
Permalink
Hi Mathias.
As far as I know the use of class weights in grid search is not possible
in SVC at the moment. It can be used as a parameter to fit, but this
prevents one from using it for grid searches.
This is a known issue and the class_weight should be moved to
the initialization of SVC.
I am (somewhat) working on this.

Cheers,
Andy
Post by Mathias Verbeke
Hi Olivier,
AssertionError: Invalid parameter class_weight for estimator SVC
Any idea what can be wrong?
Thanks,
Mathias
On Fri, Feb 3, 2012 at 12:19 PM, Olivier Grisel
Post by Mathias Verbeke
Hi Adreas,
Thanks a lot; that answers my questions. Just a quick check to
be sure I
Post by Mathias Verbeke
understand it correctly: the results in the classification
report for the
Post by Mathias Verbeke
best classifier are the ones on the test set, right?
It print the performance measured on the test set (also known as
evaluation set) of the best classifier as found on the training set
(also known as development set).
If you do the parameter selection and evaluation on the same dataset
you will be likely to overfit the hyperparameters settings and hence
your performance estimation will be an over-estimate of the true
generalization performance.
Post by Mathias Verbeke
And another small question: could you tell me how/where I need
to set the
Post by Mathias Verbeke
class_weight parameter, since this doesn't seem to work in the
regular way
Post by Mathias Verbeke
in the fit method? Would it furthermore be possible to - besides
'auto' -
Post by Mathias Verbeke
tune this as well with GridSearch?
tuned_parameters = [
{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
'C': [1, 10, 100, 1000], 'class_weight': [None, 'auto']},
{'kernel': ['linear'], 'C': [1, 10, 100, 1000],
'class_weight': [None, 'auto']}
]
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Mathias Verbeke
2012-02-03 12:47:36 UTC
Permalink
Hi Andreas,

Thanks for the answer. Hm, that's a pity. When I add it as a parameter to
fit, I get

AssertionError: Invalid parameter class_weight for estimator GridSearchCV

Does this mean class weighting isn't possible at all with GridSearch?

Thanks,

Mathias
Post by Andreas
**
Hi Mathias.
As far as I know the use of class weights in grid search is not possible
in SVC at the moment. It can be used as a parameter to fit, but this
prevents one from using it for grid searches.
This is a known issue and the class_weight should be moved to
the initialization of SVC.
I am (somewhat) working on this.
Cheers,
Andy
Hi Olivier,
AssertionError: Invalid parameter class_weight for estimator SVC
Any idea what can be wrong?
Thanks,
Mathias
Post by Olivier Grisel
Post by Mathias Verbeke
Hi Adreas,
Thanks a lot; that answers my questions. Just a quick check to be sure I
understand it correctly: the results in the classification report for
the
Post by Mathias Verbeke
best classifier are the ones on the test set, right?
It print the performance measured on the test set (also known as
evaluation set) of the best classifier as found on the training set
(also known as development set).
If you do the parameter selection and evaluation on the same dataset
you will be likely to overfit the hyperparameters settings and hence
your performance estimation will be an over-estimate of the true
generalization performance.
Post by Mathias Verbeke
And another small question: could you tell me how/where I need to set
the
Post by Mathias Verbeke
class_weight parameter, since this doesn't seem to work in the regular
way
Post by Mathias Verbeke
in the fit method? Would it furthermore be possible to - besides 'auto'
-
Post by Mathias Verbeke
tune this as well with GridSearch?
tuned_parameters = [
{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
'C': [1, 10, 100, 1000], 'class_weight': [None, 'auto']},
{'kernel': ['linear'], 'C': [1, 10, 100, 1000],
'class_weight': [None, 'auto']}
]
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas
2012-02-03 12:50:54 UTC
Permalink
Post by Mathias Verbeke
Hi Andreas,
Thanks for the answer. Hm, that's a pity. When I add it as a parameter
to fit, I get
AssertionError: Invalid parameter class_weight for estimator GridSearchCV
You would have to add it to the "fit" method of SVC, not GridSearchCV.
Post by Mathias Verbeke
Does this mean class weighting isn't possible at all with GridSearch?
At the moment, yes.

If this is very important to you, I might be able to fix it this weekend.
No promise, though.
You would have to use the github version then.

Cheers,
Andy
Mathias Verbeke
2012-02-03 12:59:01 UTC
Permalink
Hi Andreas,

You would have to add it to the "fit" method of SVC, not GridSearchCV.
How can this be done in the digits example, since there's only one fit
there, namely the one of GridSearch?
Post by Andreas
Post by Mathias Verbeke
Does this mean class weighting isn't possible at all with GridSearch?
At the moment, yes.
If this is very important to you, I might be able to fix it this weekend.
No promise, though.
You would have to use the github version then.
That would be great! I'm already using the github version, so no problem.
Thanks a lot,

Mathias
Post by Andreas
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Gilles Louppe
2012-02-03 13:04:39 UTC
Permalink
Hi,

You can inject your fit params using the `fit_params` parameter in GridSearchCV.

Gilles
Post by Mathias Verbeke
Hi Andreas,
Post by Andreas
You would have to add it to the "fit" method of SVC, not GridSearchCV.
How can this be done in the digits example, since there's only one fit
there, namely the one of GridSearch?
Post by Andreas
Post by Mathias Verbeke
Does this mean class weighting isn't possible at all with GridSearch?
At the moment, yes.
If this is very important to you, I might be able to fix it this weekend.
No promise, though.
You would have to use the github version then.
That would be great! I'm already using the github version, so no problem.
Thanks a lot,
Mathias
Post by Andreas
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Try before you buy = See our experts in action!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-dev2
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas
2012-02-06 23:26:52 UTC
Permalink
Post by Mathias Verbeke
Hi Andreas,
You would have to add it to the "fit" method of SVC, not GridSearchCV.
How can this be done in the digits example, since there's only one fit
there, namely the one of GridSearch?
Post by Mathias Verbeke
Does this mean class weighting isn't possible at all with
GridSearch?
At the moment, yes.
If this is very important to you, I might be able to fix it this weekend.
No promise, though.
You would have to use the github version then.
That would be great! I'm already using the github version, so no problem.
This should do the trick:
https://github.com/scikit-learn/scikit-learn/pull/610

Cheers,
Andy

Continue reading on narkive:
Loading...