Discussion:
[Scikit-learn-general] Binary Classifier Evaluation Metrics
Enise Basaran
2016-03-22 10:06:11 UTC
Permalink
Hi everyone,

I want to learn binary classifier evaluation metrics please. I implemented
"Binary Relevance" method for multilabel classification. *[1] * My
classifiers say "Yes" or "No". How can I calculate accuracy score of my
dataset ?


*[1] Binary Relevance (BR)* is one of the most popular approaches as a
trans-formation method that actually creates k datasets (k = |L|, total
number of classes), each for one
class label and trains a classifier on each of these datasets. Each of
these datasets contains the same number of instances as the original data,
but each dataset D λ j , 1 ≀ j ≀ k positively labels instances that belong
to class λ j and negative otherwise.
Enise Basaran
2016-03-24 13:13:31 UTC
Permalink
Hi everyone,

I want to learn binary classifier evaluation metrics please. I implemented
"Binary Relevance" method for multilabel classification.*[1] * My
classifiers say "Yes" or "No". How can I calculate accuracy score of my
dataset, what metrics can I use for my binary classifiers? Thanks in
advance.


*[1] Binary Relevance (BR)* is one of the most popular approaches as a
trans-formation method that actually creates k datasets (k = |L|, total
number of classes), each for one
class label and trains a classifier on each of these datasets. Each of
these datasets contains the same number of instances as the original data,
but each dataset D λ j , 1 ≀ j ≀ k positively labels instances that belong
to class λ j and negative otherwise.

Sincerely,
Joel Nothman
2016-03-24 20:26:58 UTC
Permalink
OneVsRestClassifier already implements Binary Relevance. What is unclear
about our documentation on model evaluation and metrics?
Post by Enise Basaran
Hi everyone,
I want to learn binary classifier evaluation metrics please. I implemented
"Binary Relevance" method for multilabel classification.*[1] * My
classifiers say "Yes" or "No". How can I calculate accuracy score of my
dataset, what metrics can I use for my binary classifiers? Thanks in
advance.
*[1] Binary Relevance (BR)* is one of the most popular approaches as a
trans-formation method that actually creates k datasets (k = |L|, total
number of classes), each for one
class label and trains a classifier on each of these datasets. Each of
these datasets contains the same number of instances as the original data,
but each dataset D λ j , 1 ≀ j ≀ k positively labels instances that belong
to class λ j and negative otherwise.
Sincerely,
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Enise Basaran
2016-03-25 07:42:43 UTC
Permalink
Hi,

I'm studying on web page classification and I have 32 categories like
'Adult', 'Business&Economy', 'Education', etc.

OneVsRestClassifier example is below :

X_train = np.array(["new york is a hell of a town",
"new york was originally dutch",
"the big apple is great",
"new york is also called the big apple",
"nyc is nice",
"people abbreviate new york city as nyc",
"the capital of great britain is london",
"london is in the uk",
"london is in england",
"london is in great britain",
"it rains a lot in london",
"london hosts the british museum",
"new york is great and so is london",
"i like london better than new york"])
y_train = [[0],[0],[0],[0],[0],[0],[1],[1],[1],[1],[1],[1],*[**0,1],[0,1**]*]

But I don't want to label data as above [0,1], because as you know
*it's very difficult to find multilabelled data*. So that I generated
32 binary dataset for 32 category. When a test content came for
prediction, test content is being sent to all classifiers and I'm
taking into account only classifiers that are returning 'Yes'. So I
could make multilabelled classification with my own dataset.

I can evaluate precision, recall and f-measure values for each
classifier(for each category) but how can I test my all dataset(all
classifiers) ? Thanks for your help in advance.
Post by Joel Nothman
OneVsRestClassifier already implements Binary Relevance. What is unclear
about our documentation on model evaluation and metrics?
Post by Enise Basaran
Hi everyone,
I want to learn binary classifier evaluation metrics please. I
implemented "Binary Relevance" method for multilabel classification.
*[1] * My classifiers say "Yes" or "No". How can I calculate accuracy
score of my dataset, what metrics can I use for my binary classifiers?
Thanks in advance.
*[1] Binary Relevance (BR)* is one of the most popular approaches as a
trans-formation method that actually creates k datasets (k = |L|, total
number of classes), each for one
class label and trains a classifier on each of these datasets. Each of
these datasets contains the same number of instances as the original data,
but each dataset D λ j , 1 ≀ j ≀ k positively labels instances that belong
to class λ j and negative otherwise.
Sincerely,
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
*Enise Başaran*
*Software Developer*
Joel Nothman
2016-03-26 10:09:56 UTC
Permalink
It looks like you should use the
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MultiLabelBinarizer.html
to convert y_train into a binary indicator matrix format that scikit-learn
can work with.
Post by Enise Basaran
Hi,
I'm studying on web page classification and I have 32 categories like
'Adult', 'Business&Economy', 'Education', etc.
X_train = np.array(["new york is a hell of a town",
"new york was originally dutch",
"the big apple is great",
"new york is also called the big apple",
"nyc is nice",
"people abbreviate new york city as nyc",
"the capital of great britain is london",
"london is in the uk",
"london is in england",
"london is in great britain",
"it rains a lot in london",
"london hosts the british museum",
"new york is great and so is london",
"i like london better than new york"])
y_train = [[0],[0],[0],[0],[0],[0],[1],[1],[1],[1],[1],[1],*[**0,1],[0,1**]*]
But I don't want to label data as above [0,1], because as you know *it's very difficult to find multilabelled data*. So that I generated 32 binary dataset for 32 category. When a test content came for prediction, test content is being sent to all classifiers and I'm taking into account only classifiers that are returning 'Yes'. So I could make multilabelled classification with my own dataset.
I can evaluate precision, recall and f-measure values for each classifier(for each category) but how can I test my all dataset(all classifiers) ? Thanks for your help in advance.
Post by Joel Nothman
OneVsRestClassifier already implements Binary Relevance. What is unclear
about our documentation on model evaluation and metrics?
Post by Enise Basaran
Hi everyone,
I want to learn binary classifier evaluation metrics please. I
implemented "Binary Relevance" method for multilabel classification.
*[1] * My classifiers say "Yes" or "No". How can I calculate accuracy
score of my dataset, what metrics can I use for my binary classifiers?
Thanks in advance.
*[1] Binary Relevance (BR)* is one of the most popular approaches as a
trans-formation method that actually creates k datasets (k = |L|, total
number of classes), each for one
class label and trains a classifier on each of these datasets. Each of
these datasets contains the same number of instances as the original data,
but each dataset D λ j , 1 ≀ j ≀ k positively labels instances that belong
to class λ j and negative otherwise.
Sincerely,
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
*Enise Başaran*
*Software Developer*
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Loading...