Sicco van Sas
2012-07-20 15:08:06 UTC
Hi all,
I use LinearSVC for multi-class multi-label text classification, but the
learned classifier doesn't always output a label when I try to classify
a test sample.
Here is the code:
classifier = Pipeline([
('vectorizer', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(LinearSVC()))])
classifier.fit(train_txt, train_labels)
print str(classifier.predict(example_txt))
Sometimes I get a fine results, e.g. [(u'dogs',)] , while it sometimes
also returns nothing: [()]
It seems that the more samples I train the classifier on, the less it
will output nothing, but even training on 20k samples still sometimes
results in no labels as output. E.g., training on 10k sampels, results
in approx 80. classifiers. I tested on 5k samples and approx 30% of the
samples were given no label, while the ones that did get one of more
labels performed quite good.
Is there a way to force the classifier to always predict at least 1 label?
Cheers,
Sicco
I use LinearSVC for multi-class multi-label text classification, but the
learned classifier doesn't always output a label when I try to classify
a test sample.
Here is the code:
classifier = Pipeline([
('vectorizer', CountVectorizer()),
('tfidf', TfidfTransformer()),
('clf', OneVsRestClassifier(LinearSVC()))])
classifier.fit(train_txt, train_labels)
print str(classifier.predict(example_txt))
Sometimes I get a fine results, e.g. [(u'dogs',)] , while it sometimes
also returns nothing: [()]
It seems that the more samples I train the classifier on, the less it
will output nothing, but even training on 20k samples still sometimes
results in no labels as output. E.g., training on 10k sampels, results
in approx 80. classifiers. I tested on 5k samples and approx 30% of the
samples were given no label, while the ones that did get one of more
labels performed quite good.
Is there a way to force the classifier to always predict at least 1 label?
Cheers,
Sicco