2012-10-22 21:43:14 UTC
Here is a long email. Hold on, we are having a hard time thinking a
design problem through. Andy and I have been discussing how to design the
score function API with regards to the cross-validation framework of the
The discussion spanned from the fact that using AUC in GridSearchCV,
which is a comon usecase, is way too hard with our current code. However,
it does reveal more general limitations of our design. Currently, the
score method of an estimator defines the way that it is evaluated when
doing cross-validation. It can be overridden using the 'score_func'
argument of GridSearchCV and cross_val_score, that can be a callable with
(y_true, y_pred) as input parameters. This immediatly breaks when trying
to do e.g. AUC, as it needs non-thresholded decision logic.
Andy has proposed several pull requests that add to GridSearchCV logic to
deal with these usecases
I haven't been convinced so far, and we are trying to see what the right
I think that Andy's approach can be summarized fairly as looking for a
way to extend the score_func in GridSearchCV to be able to deal with
richer scores. Amongst the options proposed are:
a. having the score method of object accept score_func argument, that
would than be used inside the score method:
b. having score_func's signature be 'estimator, X, y' (discussed
The drawback that I see to option a is that the requirements of score
funcs are not homogeneous, and that all cannot apply to every estimator.
We are already seeing in the PRs that we need to define a
'requires_threshold' decoration. For some of my personnal usecases, I can
already see other signatures of score functions required. I really don't
like this, because it embeds custom code in parts of the scikit that are
general purpose. This pattern, in my experience, tend to create tight
coupling and to eventually lead to code that is harder to extend.
I must say that defining a meta-language defining capabilities of score
functions really raises warning signs as far as I am concerned. I find
that contracts based on imperative code are much easier to maintain and
extend than contracts based in declarative interfaces.
Option b seems fairly reasonnable from the design point of view. I think
that it is very versatile. The main drawback that I see, is that it does
not make the user's life easy to use various score functions existing in
the metrics module, as their signature is 'y_true, y_pred'.
However, option b really has the look and smell of a method to me.
Combine with the fact that some score functions need estimator-specific
information (i.e.: how to retrieve unthresholded decisions), it led me
to think that my favorite option would be to put as much as possible in
the estimator. The option that I am championing would be to add an
argument to estimators to be able to switch the score function. This
argument could either be a string, say 'auc', or a score_func with a
given signature (estimator specific, but we would try to have as little
of these as possible).
This option has the drawback of adding an argument to most, or all the
estimators. It also bundles together an estimator and a scoring logic,
which can be seen a pro or a con, depending on the point of view. One
clear pro that I see, is that a CV object, such as the LassoCV, would be
able to use the same logic in its inner cross-validation.
An aspect that came up during the discussion with Andy is that scoring
functions are a problem that is reasonnably well-defined in the case of
classification and regression. However, for unsupervised model, the
choice of scores is much wider, and, more importantly, these most often
need to have access to some model parameters. Thus, the only way that I
see to do scoring in these settings is as a method. For supervised
settings, the problem is usually a bit simpler, as it boils down to
comparing prediction to a ground truth. Andy has been tell me that model
selection with unsupervised estimation is a corner case that is not
officially supported with the scikit. I feel quite bad about this because
it is a significant fraction of my work, and because we designed the
scikit's API originally to cater together for supervised and unsupervised
models . I think that original choice really payed off, and I
personnally really enjoy combining different estimators as black boxes.
For me, the estimators are really were the intelligence of the scikit
lies. I frown at the idea of putting intelligence elsewhere because it
makes things like implementing a grid-computing grid-search, or providing
your own estimator, harder. In addition, in my eyes, it is perfectly
valid to cook up an estimator that solve an atypical problem, because at
the end of the day, real problems don't fall in boxes like classification
Sitting back, I am not really convinced by any of the options that I
listed above, I must confess. Maybe a good option, that I don't know how
to implement, would be:
* Support only score_func for supervised problems (removes the mess that
unsupervised model draw in for part of the API)
* Figure a way to have a uniform signature for score_func that would work
for all main usecases (removes the specification language in the score
functions that freaks me out :}). The difficulties are:
- unthresholded decisions
The goal of this email is to try to have a sane discussion of what are
the best choices in terms of simplicity of code, simplicity for the user,
and versatility for the scoring API. As I am writing this email, I get to
defend my point of view, but I hope that Andy will correct any false or
incomplete vision that I gave of his point of view.
Thanks for reading!
 see http://sourceforge.net/mailarchive/message.php?msg_id=25077802
for one of the original discussion