Discussion:
[Scikit-learn-general] How to interpret the 'score' of a support vector regression
John Richey
2013-05-03 14:41:49 UTC
Permalink
Hi all -

I am relatively new to the world of machine learning, and I am having a little difficulty in interpreting the output of a support vector regression problem. For simplicity, lets say I have 2 variables and 100 subjects. Both variables in my model are continuous.

To make matters a little more complicated, I have four "sites" at which data were collected, and I want to "leave one label out", where labels correspond to sites for the purposes of assessing whether site has an influence on the predictive model.

Here is the code so far.



lolo = LeaveOneLabelOut(labels)

for train_index, test_index in lolo:

X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]

clf = svm.SVR()
clf = clf.fit(X_train, y_train)
s=clf.score(X_test, y_test)
print s

scores = cross_validation.cross_val_score(clf,X_test, y_test)

print "Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() / 2)


It produces the following output
0.0343889480748
Accuracy: -0.05 (+/- 0.05)
-0.0786771792262
Accuracy: -0.25 (+/- 0.07)
-0.0871562121791
Accuracy: -0.12 (+/- 0.05)
-0.0496675695436
Accuracy: -0.16 (+/- 0.03)






Could someone help me in how to interpret the substantive meaning of the 'score' in an SVR problem? Thanks in advance.
Andreas Mueller
2013-05-05 22:12:37 UTC
Permalink
Hi John.
For regression problems, the score is the R^2, the coefficient of
determination:
https://en.wikipedia.org/wiki/Coefficient_of_determination
as is explained in the documentation
http://scikit-learn.org/dev/modules/generated/sklearn.svm.SVR.html#sklearn.svm.SVR.score

Negative scores basically mean you learned nothing.

You can also use MSE for regression problems (using
sklearn.metrics.mean_squared_error or scoring='mse' in the newest version),
which I find a bit easier to interpret.

Hth,
Andy
Post by John Richey
Hi all -
I am relatively new to the world of machine learning, and I am having
a little difficulty in interpreting the output of a support vector
regression problem. For simplicity, lets say I have 2 variables and
100 subjects. Both variables in my model are continuous.
To make matters a little more complicated, I have four "sites" at
which data were collected, and I want to "leave one label out", where
labels correspond to sites for the purposes of assessing whether site
has an influence on the predictive model.
Here is the code so far.
lolo = LeaveOneLabelOut(labels)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
clf = svm.SVR()
clf = clf.fit(X_train, y_train)
s=clf.score(X_test, y_test)
print s
scores = cross_validation.cross_val_score(clf,X_test, y_test)
print "Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() / 2)
It produces the following output
0.0343889480748
Accuracy: -0.05 (+/- 0.05)
-0.0786771792262
Accuracy: -0.25 (+/- 0.07)
-0.0871562121791
Accuracy: -0.12 (+/- 0.05)
-0.0496675695436
Accuracy: -0.16 (+/- 0.03)
Could someone help me in how to interpret the substantive meaning of
the 'score' in an SVR problem? Thanks in advance.
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Joel Nothman
2013-05-05 22:30:00 UTC
Permalink
Apart from the scores, I'm not sure what you are trying to calculate with
cross_val_score here: you're passing it a single LOLO fold's test data,
i.e. corresponding to one label from labels. It is reporting the results of
three-fold cross validation over that sample. Is that what you intended? -
Joel
Post by John Richey
Hi all -
I am relatively new to the world of machine learning, and I am having a
little difficulty in interpreting the output of a support vector regression
problem. For simplicity, lets say I have 2 variables and 100 subjects.
Both variables in my model are continuous.
To make matters a little more complicated, I have four "sites" at which
data were collected, and I want to "leave one label out", where labels
correspond to sites for the purposes of assessing whether site has an
influence on the predictive model.
Here is the code so far.
lolo = LeaveOneLabelOut(labels)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
clf = svm.SVR()
clf = clf.fit(X_train, y_train)
s=clf.score(X_test, y_test)
print s
scores = cross_validation.cross_val_score(clf,X_test, y_test)
print "Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() / 2)
It produces the following output
0.0343889480748
Accuracy: -0.05 (+/- 0.05)
-0.0786771792262
Accuracy: -0.25 (+/- 0.07)
-0.0871562121791
Accuracy: -0.12 (+/- 0.05)
-0.0496675695436
Accuracy: -0.16 (+/- 0.03)
Could someone help me in how to interpret the substantive meaning of the
'score' in an SVR problem? Thanks in advance.
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
John Richey
2013-05-06 13:34:23 UTC
Permalink
Hmm, perhaps I am going about this all wrong.

What I'd like to do is assess whether "site" impacts the predictability of the data. Wouldn't a LOLO approach (with labels corresponding to sites) be a viable way to assess this? Lower values for R^2 would indicate that a site is not producing data that are reliable with other sites?
Apart from the scores, I'm not sure what you are trying to calculate with cross_val_score here: you're passing it a single LOLO fold's test data, i.e. corresponding to one label from labels. It is reporting the results of three-fold cross validation over that sample. Is that what you intended? - Joel
Hi all -
I am relatively new to the world of machine learning, and I am having a little difficulty in interpreting the output of a support vector regression problem. For simplicity, lets say I have 2 variables and 100 subjects. Both variables in my model are continuous.
To make matters a little more complicated, I have four "sites" at which data were collected, and I want to "leave one label out", where labels correspond to sites for the purposes of assessing whether site has an influence on the predictive model.
Here is the code so far.
lolo = LeaveOneLabelOut(labels)
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
clf = svm.SVR()
clf = clf.fit(X_train, y_train)
s=clf.score(X_test, y_test)
print s
scores = cross_validation.cross_val_score(clf,X_test, y_test)
print "Accuracy: %0.2f (+/- %0.2f)" % (scores.mean(), scores.std() / 2)
It produces the following output
0.0343889480748
Accuracy: -0.05 (+/- 0.05)
-0.0786771792262
Accuracy: -0.25 (+/- 0.07)
-0.0871562121791
Accuracy: -0.12 (+/- 0.05)
-0.0496675695436
Accuracy: -0.16 (+/- 0.03)
Could someone help me in how to interpret the substantive meaning of the 'score' in an SVR problem? Thanks in advance.
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Get 100% visibility into Java/.NET code with AppDynamics Lite
It's a free troubleshooting tool designed for production
Get down to code-level detail for bottlenecks, with <2% overhead.
Download for free and get started troubleshooting in minutes.
http://p.sf.net/sfu/appdyn_d2d_ap2_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Loading...