Richard Cubek
2013-04-28 18:06:11 UTC
Hello everyone,
I'm new to the list so first of all thanks a lot for your work on this
lib!
I need libsvm probability estimates as well as Logistic Regression (LR)
in a three classes problem with a training data set size of about 5-6000
at 20-50 features. I am familiar with python and octave (regarding math
even more with octave), but I would prefer python since I need all the
programming stuff which can be tedious in octave...
Reading lot of posts in discussions, scikit seems to offer the most
advanced and well documented python binding for libsvm, but I also found
following site:
http://fseoane.net/blog/2010/fast-bindings-for-libsvm-in-scikitslearn
He writes, that his bindings are implemented in scikit, but he also
writes, that the code is in alpha status, that was three years ago.
Well, I started with a simple problem with 65 data points with 2
features each.
Questions:
1) Playing around with svm probability, it "seems" to work nice
(Loading Image...). I just wanted
to ask, how stable the python binding is regarding the website issue
mentioned above.
2) Playing around with LR, the results "look interesting"
(Loading Image...), but I was
not able to reproduce a model adopting/"overfitting" to every single
data point, as in the SVM example plot (tried very large C). I did the
first ML online class with Andrew Ng, there we implemented LR ourselves,
but the feature creation from the data features was ad hoc (from x and y
to x^2, y^2, x*y, x*y^2 and so on). I followed the same feature mapping
here, at the end getting 28 features out of 2. It takes about 15-17
seconds to fit the model (on my simple example).
I know feature selection/extraction itself is a big research topic, but
maybe scikit can help me here without the need to read a dozen papers or
maybe there are some rules of thumb. So is there any method within
scikit, that could help me finding a feature mapping? I guess, that
RandomizedLogisticRegression could help me somehow, but I didn't really
get the point. I think, here I again have to provide the features myself
and it will just help me finding the best by trying out randomly? On my
real data set, mapping the 20-50 features to higher-dimensional spaces
and trying out would probably take too long, if I consider the 15
seconds needed for a single model on the simple example (and here we are
not yet talking about searching the optimal regularization C). Any
suggestions?
Cheers!
Richard
I'm new to the list so first of all thanks a lot for your work on this
lib!
I need libsvm probability estimates as well as Logistic Regression (LR)
in a three classes problem with a training data set size of about 5-6000
at 20-50 features. I am familiar with python and octave (regarding math
even more with octave), but I would prefer python since I need all the
programming stuff which can be tedious in octave...
Reading lot of posts in discussions, scikit seems to offer the most
advanced and well documented python binding for libsvm, but I also found
following site:
http://fseoane.net/blog/2010/fast-bindings-for-libsvm-in-scikitslearn
He writes, that his bindings are implemented in scikit, but he also
writes, that the code is in alpha status, that was three years ago.
Well, I started with a simple problem with 65 data points with 2
features each.
Questions:
1) Playing around with svm probability, it "seems" to work nice
(Loading Image...). I just wanted
to ask, how stable the python binding is regarding the website issue
mentioned above.
2) Playing around with LR, the results "look interesting"
(Loading Image...), but I was
not able to reproduce a model adopting/"overfitting" to every single
data point, as in the SVM example plot (tried very large C). I did the
first ML online class with Andrew Ng, there we implemented LR ourselves,
but the feature creation from the data features was ad hoc (from x and y
to x^2, y^2, x*y, x*y^2 and so on). I followed the same feature mapping
here, at the end getting 28 features out of 2. It takes about 15-17
seconds to fit the model (on my simple example).
I know feature selection/extraction itself is a big research topic, but
maybe scikit can help me here without the need to read a dozen papers or
maybe there are some rules of thumb. So is there any method within
scikit, that could help me finding a feature mapping? I guess, that
RandomizedLogisticRegression could help me somehow, but I didn't really
get the point. I think, here I again have to provide the features myself
and it will just help me finding the best by trying out randomly? On my
real data set, mapping the 20-50 features to higher-dimensional spaces
and trying out would probably take too long, if I consider the 15
seconds needed for a single model on the simple example (and here we are
not yet talking about searching the optimal regularization C). Any
suggestions?
Cheers!
Richard