Discussion:
API change suggestions for metrics module
(too old to reply)
Mathieu Blondel
2010-10-12 04:54:51 UTC
Permalink
Hello,

In the metrics module, the function precision_recall(y, probas_)
currently outputs the precision/recall pairs for different probability
thresholds. This is only one possible criterion for plotting the
precision/recall curve so I suggest renaming it to
precision_recall_curve(y, probas_) and add the function
precision_recall(y_true, y_pred). This will be a useful utility for
plotting the precision/recall curve against other criteria (e.g., the
value of a hyperparameter). See below for the patch of what I'm
proposing. In addition, the code works in the multi-label setting too.
See below for a test that shows what I mean.

A similar change can be done for the ROC curve function.

Also, I noticed an inconsistency in the code: some metrics use
function(y_true, y_pred) while others use function(y_pred, y_true).
Some metrics aren't symmetric so it it would be nice to make this
consistent.

The changes I propose will break existing programs but I think it's
better to do it sooner than later. If that's OK, I will make the
necessary modifications and commit.

Mathieu

PATCH:

diff --git a/scikits/learn/metrics.py b/scikits/learn/metrics.py
index 393b411..f1c389a 100644
--- a/scikits/learn/metrics.py
+++ b/scikits/learn/metrics.py
@@ -115,7 +115,15 @@ def auc(x, y):
return area


-def precision_recall(y, probas_):
+def precision_recall(y_true, y_pred):
+ true_pos = np.sum(y_true[y_pred == 1]==1)
+ false_pos = np.sum(y_true[y_pred == 1]==0)
+ false_neg = np.sum(y_true[y_pred == 0]==1)
+ precision = true_pos / float(true_pos + false_pos)
+ recall = true_pos / float(true_pos + false_neg)
+ return precision, recall
+
+def precision_recall_curve(y, probas_):
"""compute Precision-Recall

Parameters
@@ -149,11 +157,9 @@ def precision_recall(y, probas_):
precision = np.empty(n_thresholds)
recall = np.empty(n_thresholds)
for i, t in enumerate(thresholds):
- true_pos = np.sum(y[probas_>=t]==1)
- false_pos = np.sum(y[probas_>=t]==0)
- false_neg = np.sum(y[probas_<t]==1)
- precision[i] = true_pos / float(true_pos + false_pos)
- recall[i] = true_pos / float(true_pos + false_neg)
+ y_pred = np.ones(len(y))
+ y_pred[probas_ < t] = 0
+ precision[i], recall[i] = precision_recall(y, y_pred)

TEST:

def test_precision_recall_multilabel():
Y_true = np.array([[1, 0, 1, 0],
[1, 0, 0, 0],
[0, 0, 0, 0],
[0, 1, 0, 0],
[0, 1, 1, 1]])

Y_pred = np.array([[1, 1, 1, 0],
[1, 0, 0, 0],
[0, 1, 0, 0],
[0, 1, 0, 0],
[0, 0, 1, 1]])

n_pred = 8.0
n_corr_pred = 6.0
n_labeled = 7.0
precision = n_corr_pred / n_pred
recall = n_corr_pred / n_labeled

assert_equal((precision, recall),
precision_recall(Y_true, Y_pred))
Olivier Grisel
2010-10-12 07:25:57 UTC
Permalink
+1
Alexandre Gramfort
2010-10-12 19:32:13 UTC
Permalink
+1 for:
precision_recall -> precision_recall_curve
roc -> roc_curve

make sure also to patch the examples.

However I would use two functions for precision and recall as they
are 2 different metrics.

See:
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/eval/index.html

It would be great to have examples of cross_val_score based on such
metrics and also have GridSearchCV be able to work with these metrics.

Alex

On Tue, Oct 12, 2010 at 9:26 AM, Olivier Grisel
+1
------------------------------------------------------------------------------
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1,  ECMAScript5, and DOM L2 & L3.
Spend less time writing and  rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Olivier Grisel
2010-10-12 19:35:35 UTC
Permalink
And a third one for the f1-score which is the geometric average of both.
Post by Alexandre Gramfort
precision_recall -> precision_recall_curve
roc -> roc_curve
make sure also to patch the examples.
However I would use two functions for precision and recall as they
are 2 different metrics.
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/eval/index.html
It would be great to have examples of cross_val_score based on such
metrics and also have GridSearchCV be able to work with these metrics.
Alex
On Tue, Oct 12, 2010 at 9:26 AM, Olivier Grisel
+1
------------------------------------------------------------------------------
Post by Alexandre Gramfort
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3.
Spend less time writing and rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Alexandre Gramfort
Beautiful is writing same markup. Internet Explorer 9 supports
standards for HTML5, CSS3, SVG 1.1, ECMAScript5, and DOM L2 & L3.
Spend less time writing and rewriting code and more time creating great
experiences on the web. Be a part of the beta today.
http://p.sf.net/sfu/beautyoftheweb
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Mathieu Blondel
2010-10-13 04:40:04 UTC
Permalink
On Wed, Oct 13, 2010 at 4:32 AM, Alexandre Gramfort
Post by Alexandre Gramfort
precision_recall -> precision_recall_curve
roc -> roc_curve
make sure also to patch the examples.
I've made the modifications. I also fixed the consistency problem
regarding the order of the parameters because it does seem to be
meaningful in some cases. For example, when I call precision(y_pred,
y_true) instead of precision(y_true, y_pred), it returns the recall,
and vice-versa. I added a unit test regarding this symmetry problem.
Post by Alexandre Gramfort
However I would use two functions for precision and recall as they
are 2 different metrics.
I did that but precision and recall alone are not good metrics so as
Olivier suggested, I also added a f1 score metric.
Post by Alexandre Gramfort
It would be great to have examples of cross_val_score based on such
metrics and also have GridSearchCV be able to work with these metrics.
I find it annoying that I cannot use a score function directly. E.g.

GridSearchCV(clf, params, score_func=f1_score)

instead of

GridSearchCV(clf, params, loss_func=lambda a,b: -f1_score(a,b))

Besides, for a loss function the smaller the better while for a score
function the greater the better. This can be confusing and
error-prone.

Shall we allow the user to define either one of score_func or loss_func ?

Mathieu
Alexandre Gramfort
2010-10-13 07:05:58 UTC
Permalink
Post by Mathieu Blondel
I've made the modifications. I also fixed the consistency problem
regarding the order of the parameters because it does seem to be
meaningful in some cases. For example, when I call precision(y_pred,
y_true) instead of precision(y_true, y_pred), it returns the recall,
and vice-versa. I added a unit test regarding this symmetry problem.
I just read your commits and fully approve them.
Post by Mathieu Blondel
Post by Alexandre Gramfort
However I would use two functions for precision and recall as they
are 2 different metrics.
I did that but precision and recall alone are not good metrics so as
Olivier suggested, I also added a f1 score metric.
great
Post by Mathieu Blondel
Post by Alexandre Gramfort
It would be great to have examples of cross_val_score based on such
metrics and also have GridSearchCV be able to work with these metrics.
I find it annoying that I cannot use a score function directly. E.g.
GridSearchCV(clf, params, score_func=f1_score)
instead of
GridSearchCV(clf, params, loss_func=lambda a,b: -f1_score(a,b))
Besides, for a loss function the smaller the better while for a score
function the greater the better. This can be confusing and
error-prone.
Shall we allow the user to define either one of score_func or loss_func ?
ok go for loss_func -> score_func in GridSearchCV and add the zero_one_score
function that counts the number of correct prediction. By doing so all metrics
in metrics.py are score functions.

does this sound reasonable?

Alex
Mathieu Blondel
2010-10-13 10:07:59 UTC
Permalink
On Wed, Oct 13, 2010 at 4:05 PM, Alexandre Gramfort
Post by Alexandre Gramfort
ok go for loss_func -> score_func in GridSearchCV and add the zero_one_score
function that counts the number of correct prediction. By doing so all metrics
in metrics.py are score functions.
does this sound reasonable?
If using loss functions this way is not too cumbersome, why not! Maybe
you can post example code to show what it looks like.

Another possibility is to let the user choose between score_func and
loss_func, meaning that __init__ has both.

Mathieu
Alexandre Gramfort
2010-10-13 10:35:37 UTC
Permalink
Post by Mathieu Blondel
If using loss functions this way is not too cumbersome, why not! Maybe
you can post example code to show what it looks like.
I'll do this.
Post by Mathieu Blondel
Another possibility is to let the user choose between score_func and
loss_func, meaning that __init__ has both.
I'm not too found of this as it makes more branching and I don't like
when a function ignores some parameters depending on others.

Alex
Gael Varoquaux
2010-10-17 14:56:01 UTC
Permalink
Post by Alexandre Gramfort
Post by Mathieu Blondel
Another possibility is to let the user choose between score_func and
loss_func, meaning that __init__ has both.
I'm not too found of this as it makes more branching and I don't like
when a function ignores some parameters depending on others.
I have the same gut fealing as Alex here.

Appart from that small remark, I am +1 on the whole thread.

Gaël

Continue reading on narkive:
Loading...