You mean TP / N, not TP / TN.
And I think the average per-class accuracy does some weird things. Like:
I don't think that's very useful.
Post by Joel NothmanPost by Joel NothmanFirstly, balanced accuracy is a different thing, and yes, it should be
supported.
Post by Joel NothmanSecondly, I am correct in thinking you're talking about multiclass (not
multilabel).
Sorry for the confusion, and yes, you are right. I think have mixed the
terms âaverage per-class accuracyâ with âbalanced accuracyâ then.
Maybe to clarify, a corrected example to describe what I meant. Given the confusion matrix
predicted
label
[ 3, 0, 0]
true [ 7, 50, 12]
label [ 0, 0, 18]
Iâd compute the accuracy as TP / TN = (3 + 50 + 18) / 90 = 0.79
and the âaverage per-class accuracyâ as
(83/90 + 71/90 + 78/90) / 3 = (83 + 71 + 78) / (3 * 90) = 0.86
(I hope I got it right this time!)
In any case, I am not finding any literature describing this, and I am
also not proposing to add it to sickit-learn, just wanted to get some info
whether this is implemented or not. Thanks! :)
Post by Joel NothmanFirstly, balanced accuracy is a different thing, and yes, it should be
supported.
Post by Joel NothmanSecondly, I am correct in thinking you're talking about multiclass (not
multilabel).
Post by Joel NothmanHowever, what you're describing isn't accuracy. It's actually
micro-averaged recall, except that your dataset is impossible because
you're allowing there to be fewer predictions than instances. If we assume
that we're allowed to predict some negative class, that's fine; we can
nowadays exclude it from micro-averaged recall with the labels parameter to
recall_score. (If all labels are included in a multiclass problem,
micro-averaged recall = precision = fscore = accuracy.)
Post by Joel NothmanI had assumed you meant binarised accuracy, which would add together
both true positives and true negatives for each class.
Post by Joel NothmanEither way, if there's no literature on this, I think we'd really best
not support it.
Post by Joel NothmanI havenât seen this in practice, yet, either. A colleague was looking
for this in scikit-learn recently, and he asked me if I know whether this
is implemented or not. I couldnât find anything in the docs and was just
curious about your opinion. However, I just found this entry here on
Post by Joel Nothmanhttps://en.wikipedia.org/wiki/Accuracy_and_precision
Another useful performance measure is the balanced accuracy[10] which
avoids inflated performance estimates on imbalanced datasets. It is defined
as the arithmetic mean of sensitivity and specificity, or the average
Post by Joel NothmanAm I right in thinking that in the binary case, this is identical to
accuracy?
Post by Joel NothmanI think it would only be equal to the âaccuracyâ if the class labels are
uniformly distributed.
Post by Joel NothmanI'm not sure what this metric is getting at.
I have to think about this more, but I think it may be useful for
imbalanced datasets where you want to emphasize the minority class. E.g.,
letâs say we have a dataset of 120 samples and three class labels 1, 2, 3.
And the classes are distributed like this
Post by Joel Nothman10 x 1
50 x 2
60 x 3
Now, letâs assume we have a model that makes the following predictions
- it gets 0 out of 10 from class 1 right
- 45 out of 50 from class 2
- 55 out of 60 from class 3
So, the accuracy would then be computed as
(0 + 45 + 55) / 120 = 0.833
But the âbalanced accuracyâ would be much lower, because the model did
really badly on class 1, i.e.,
Post by Joel Nothman(0/10 + 45/50 + 55/60) / 3 = 0.61
Hm, if I see this correctly, this is actually very similar to the F1
score. But instead of computing the harmonic mean between âprecision and
the true positive rate), we compute the harmonic mean between "precision
and true negative rate"
Post by Joel NothmanI've not seen this metric used (references?). Am I right in thinking
that in the binary case, this is identical to accuracy? If I predict all
elements to be the majority class, then adding more minority classes into
the problem increases my score. I'm not sure what this metric is getting at.
Post by Joel NothmanHi,
I was just wondering why thereâs no support for the average per-class
accuracy in the scorer functions (if I am not overlooking something).
Post by Joel NothmanE.g., we have 'f1_macro', 'f1_micro', 'f1_samples', âf1_weightedâ but
I didnât see a âaccuracy_macroâ, i.e.,
Post by Joel Nothman(acc.class_1 + acc.class_2 + ⊠+ acc.class_n) / n
Would you discourage its usage (in favor of other metrics in
imbalanced class problems) or was it simply not implemented, yet?
------------------------------------------------------------------------------
Post by Joel NothmanTransform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Joel NothmanTransform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval_______________________________________________
Post by Joel NothmanScikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Joel NothmanTransform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Joel NothmanTransform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140_______________________________________________
Post by Joel NothmanScikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general