Lucas Wiman

2011-05-18 00:04:40 UTC

Hello,

I'm new to the scikits.learn mailing list, but have been using the library

for several months. I'm interested in contributing an estimator using

Platt's method of generating a sigmoid function to learn probability

estimates from the outputs of SVMs. The original method is described here:

http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.41.1639

and Lin et al's numerical improvement is described here:

http://www.csie.ntu.edu.tw/~htlin/paper/doc/plattprob.pdf

This method is also implemented in LibSVM (indeed included in the svm.cpp

file in scikits.learn in the function sigmoid_train). I'm thinking

something along the lines of the following (where variable_X and variable_Y

are the training feature vectors and labels respectively):

svc = LinearSVC()

svc.fit(train_X, train_Y)

platt_estimator = SigmoidProbabilityEstimator()

platt_estimator.fit(svc.decision_function(test_X), test_Y)

# Outputs an array of estimated probabilities

platt_estimator.predict(svc.decision_function(X))

We could also add functions to LinearSVC and SVC classes which take an

estimator as an input and set a prob_estimator field on the estimator. When

this field is defined, predict_proba will return probabilities rather than

their current behavior of raising NotImplementedError.

Thoughts?

Additionally, I'm not particularly familiar with using Cython (or indeed

C/C++ in general), so any pointers about how to wrap C functionality in

scikits.learn would be greatly appreciated.

Thanks and best wishes,

Lucas Wiman

