Abhi
2012-11-06 00:33:06 UTC
Hello,
I have been reading and testing examples around the sklearn documentation and
am not too clear on few things and would appreciate any help regarding the
following questions:
1) What would be the advantage of training LogisticRegression vs
OneVsRestClassifier(LogisticRegression()) for multiclass. (I understand
the latter would basically train n_classes classifiers).
2) Isnt SGDClassifier(loss='log') better than LogisticRegression for large
sparse datasets? If so, why?
3) If I need predict_proba for just the best class match from the multiclass
classifier, can I use OneVsRestClassifier(SGDClassifier())
I tested on empirical data, but have approximately similar results with
LogisticRegression, SGDClassifier and LinearSVC. (For now data.shape = (10000,
400000) ). However in future the data might scale to large number of training
set and features, so wanted to get clearer idea on which approach to choose.
Thanks,
A
I have been reading and testing examples around the sklearn documentation and
am not too clear on few things and would appreciate any help regarding the
following questions:
1) What would be the advantage of training LogisticRegression vs
OneVsRestClassifier(LogisticRegression()) for multiclass. (I understand
the latter would basically train n_classes classifiers).
2) Isnt SGDClassifier(loss='log') better than LogisticRegression for large
sparse datasets? If so, why?
3) If I need predict_proba for just the best class match from the multiclass
classifier, can I use OneVsRestClassifier(SGDClassifier())
I tested on empirical data, but have approximately similar results with
LogisticRegression, SGDClassifier and LinearSVC. (For now data.shape = (10000,
400000) ). However in future the data might scale to large number of training
set and features, so wanted to get clearer idea on which approach to choose.
Thanks,
A