Lucas Wiman

2011-05-18 00:04:40 UTC

Hello,

I'm new to the scikits.learn mailing list, but have been using the library

for several months. I'm interested in contributing an estimator using

Platt's method of generating a sigmoid function to learn probability

estimates from the outputs of SVMs. The original method is described here:

http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.1639

and Lin et al's numerical improvement is described here:

http://www.csie.ntu.edu.tw/~htlin/paper/doc/plattprob.pdf

This method is also implemented in LibSVM (indeed included in the svm.cpp

file in scikits.learn in the function sigmoid_train). I'm thinking

something along the lines of the following (where variable_X and variable_Y

are the training feature vectors and labels respectively):

svc = LinearSVC()

svc.fit(train_X, train_Y)

platt_estimator = SigmoidProbabilityEstimator()

platt_estimator.fit(svc.decision_function(test_X), test_Y)

# Outputs an array of estimated probabilities

platt_estimator.predict(svc.decision_function(X))

We could also add functions to LinearSVC and SVC classes which take an

estimator as an input and set a prob_estimator field on the estimator. When

this field is defined, predict_proba will return probabilities rather than

their current behavior of raising NotImplementedError.

Thoughts?

Additionally, I'm not particularly familiar with using Cython (or indeed

C/C++ in general), so any pointers about how to wrap C functionality in

scikits.learn would be greatly appreciated.

Thanks and best wishes,

Lucas Wiman

I'm new to the scikits.learn mailing list, but have been using the library

for several months. I'm interested in contributing an estimator using

Platt's method of generating a sigmoid function to learn probability

estimates from the outputs of SVMs. The original method is described here:

http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.1639

and Lin et al's numerical improvement is described here:

http://www.csie.ntu.edu.tw/~htlin/paper/doc/plattprob.pdf

This method is also implemented in LibSVM (indeed included in the svm.cpp

file in scikits.learn in the function sigmoid_train). I'm thinking

something along the lines of the following (where variable_X and variable_Y

are the training feature vectors and labels respectively):

svc = LinearSVC()

svc.fit(train_X, train_Y)

platt_estimator = SigmoidProbabilityEstimator()

platt_estimator.fit(svc.decision_function(test_X), test_Y)

# Outputs an array of estimated probabilities

platt_estimator.predict(svc.decision_function(X))

We could also add functions to LinearSVC and SVC classes which take an

estimator as an input and set a prob_estimator field on the estimator. When

this field is defined, predict_proba will return probabilities rather than

their current behavior of raising NotImplementedError.

Thoughts?

Additionally, I'm not particularly familiar with using Cython (or indeed

C/C++ in general), so any pointers about how to wrap C functionality in

scikits.learn would be greatly appreciated.

Thanks and best wishes,

Lucas Wiman