Discussion:
libsvm data support
(too old to reply)
Hueseyin Hakan Pekmezci
2013-07-12 14:35:55 UTC
Permalink
Hi scikit-learn members,

0.13.1 documentation states that individual datasets can
be loaded in svmlight / libsvm format. So I have fed in
"iris.scale" libSVM dataset however some erroneous
behaviour happens. I am just trying to reproduce
"plot_iris_exercise.py" with
iris.scale(http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/iris.scale).

The problem in CLI:

$python2.7 linsvm.py iris.scale
Accuracy 0.973333333333
Traceback (most recent call last):
File "linsvm.py", line 28, in <module>
n_sample = len(X)
File
"/usr/lib/python2.7/dist-packages/scipy/sparse/base.py",
line 175, in __len__
raise TypeError("sparse matrix length is ambiguous;
use getnnz()"
TypeError: sparse matrix length is ambiguous; use getnnz()
or shape[0]

The code is very simple below:

import numpy as np
import pylab as pl
from sklearn import datasets, svm
from sklearn.datasets import load_svmlight_file
import sys

#iris = datasets.load_iris()
#X = iris.data
#y = iris.target

##import iris.scale dataset in LibSVM
X_in, y_in=load_svmlight_file(sys.argv[1])

X_train=X_in
y_train=y_in
X_test=X_in
y_test=y_in

svc=svm.SVC(kernel='linear')
clf=svc.fit(X_train, y_train)
y_pred =clf.predict(X_test)
print "Accuracy", np.mean(y_pred == y_test)

X = X_in[y_in != 0, :2]
y = y_in[y_in != 0]

##The following code is from plot_iris_exercise.py
n_sample = len(X)

np.random.seed(0)
order = np.random.permutation(n_sample)
X = X[order]
y = y[order].astype(np.float)

X_train = X[:.9 * n_sample]
y_train = y[:.9 * n_sample]
X_test = X[.9 * n_sample:]
y_test = y[.9 * n_sample:]

# fit the model
for fig_num, kernel in enumerate(('linear', 'rbf',
'poly')):
clf = svm.SVC(kernel=kernel, gamma=10)
clf.fit(X_train, y_train)

pl.figure(fig_num)
pl.clf()
pl.scatter(X[:, 0], X[:, 1], c=y, zorder=10,
cmap=pl.cm.Paired)

# Circle out the test data
pl.scatter(X_test[:, 0], X_test[:, 1], s=80,
facecolors='none', zorder=10)

pl.axis('tight')
x_min = X[:, 0].min()
x_max = X[:, 0].max()
y_min = X[:, 1].min()
y_max = X[:, 1].max()

XX, YY = np.mgrid[x_min:x_max:200j, y_min:y_max:200j]
Z = clf.decision_function(np.c_[XX.ravel(),
YY.ravel()])

# Put the result into a color plot
Z = Z.reshape(XX.shape)
pl.pcolormesh(XX, YY, Z > 0, cmap=pl.cm.Paired)
pl.contour(XX, YY, Z, colors=['k', 'k', 'k'],
linestyles=['--', '-', '--'],
levels=[-.5, 0, .5])

pl.title(kernel)
pl.show()


Many thanks,

Hakan
Mathieu Blondel
2013-07-12 14:59:57 UTC
Permalink
Well the error message says it all: you cannot use len on a sparse matrix.
Instead of len(X), use X.shape[0].

Mathieu

On Fri, Jul 12, 2013 at 11:35 PM, Hueseyin Hakan Pekmezci <
Post by Hueseyin Hakan Pekmezci
Hi scikit-learn members,
0.13.1 documentation states that individual datasets can
be loaded in svmlight / libsvm format. So I have fed in
"iris.scale" libSVM dataset however some erroneous
behaviour happens. I am just trying to reproduce
"plot_iris_exercise.py" with
iris.scale(
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/iris.scale
).
$python2.7 linsvm.py iris.scale
Accuracy 0.973333333333
File "linsvm.py", line 28, in <module>
n_sample = len(X)
File
"/usr/lib/python2.7/dist-packages/scipy/sparse/base.py",
line 175, in __len__
raise TypeError("sparse matrix length is ambiguous;
use getnnz()"
TypeError: sparse matrix length is ambiguous; use getnnz()
or shape[0]
import numpy as np
import pylab as pl
from sklearn import datasets, svm
from sklearn.datasets import load_svmlight_file
import sys
#iris = datasets.load_iris()
#X = iris.data
#y = iris.target
##import iris.scale dataset in LibSVM
X_in, y_in=load_svmlight_file(sys.argv[1])
X_train=X_in
y_train=y_in
X_test=X_in
y_test=y_in
svc=svm.SVC(kernel='linear')
clf=svc.fit(X_train, y_train)
y_pred =clf.predict(X_test)
print "Accuracy", np.mean(y_pred == y_test)
X = X_in[y_in != 0, :2]
y = y_in[y_in != 0]
##The following code is from plot_iris_exercise.py
n_sample = len(X)
np.random.seed(0)
order = np.random.permutation(n_sample)
X = X[order]
y = y[order].astype(np.float)
X_train = X[:.9 * n_sample]
y_train = y[:.9 * n_sample]
X_test = X[.9 * n_sample:]
y_test = y[.9 * n_sample:]
# fit the model
for fig_num, kernel in enumerate(('linear', 'rbf',
clf = svm.SVC(kernel=kernel, gamma=10)
clf.fit(X_train, y_train)
pl.figure(fig_num)
pl.clf()
pl.scatter(X[:, 0], X[:, 1], c=y, zorder=10,
cmap=pl.cm.Paired)
# Circle out the test data
pl.scatter(X_test[:, 0], X_test[:, 1], s=80,
facecolors='none', zorder=10)
pl.axis('tight')
x_min = X[:, 0].min()
x_max = X[:, 0].max()
y_min = X[:, 1].min()
y_max = X[:, 1].max()
XX, YY = np.mgrid[x_min:x_max:200j, y_min:y_max:200j]
Z = clf.decision_function(np.c_[XX.ravel(),
YY.ravel()])
# Put the result into a color plot
Z = Z.reshape(XX.shape)
pl.pcolormesh(XX, YY, Z > 0, cmap=pl.cm.Paired)
pl.contour(XX, YY, Z, colors=['k', 'k', 'k'],
linestyles=['--', '-', '--'],
levels=[-.5, 0, .5])
pl.title(kernel)
pl.show()
Many thanks,
Hakan
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Hakan
2013-07-12 15:11:36 UTC
Permalink
Initally I have tried that one you mentioned but I toss
the barrier as following. Then I started to reconsider may
be there is a problem with libSVM reading...
Traceback (most recent call last):
File "linsvm.py", line 48, in <module>
pl.scatter(X[:, 0], X[:, 1], c=y, zorder=10,
cmap=pl.cm.Paired)
File
"/usr/lib/pymodules/python2.7/matplotlib/pyplot.py", line
2557, in scatter
ret = ax.scatter(x, y, s, c, marker, cmap, norm,
vmin, vmax, alpha, linewidths, faceted, verts, **kwargs)
File "/usr/lib/pymodules/python2.7/matplotlib/axes.py",
line 5817, in scatter
colors = mcolors.colorConverter.to_rgba_array(c,
alpha)
File
"/usr/lib/pymodules/python2.7/matplotlib/colors.py", line
380, in to_rgba_array
raise ValueError("Color array must be
two-dimensional")
ValueError: Color array must be two-dimensional

Thanks,
Hakan

On Fri, 12 Jul 2013 23:59:57 +0900
Post by Mathieu Blondel
Well the error message says it all: you cannot use len
on a sparse matrix.
Instead of len(X), use X.shape[0].
Mathieu
On Fri, Jul 12, 2013 at 11:35 PM, Hueseyin Hakan
Pekmezci <
Post by Hueseyin Hakan Pekmezci
Hi scikit-learn members,
0.13.1 documentation states that individual datasets can
be loaded in svmlight / libsvm format. So I have fed in
"iris.scale" libSVM dataset however some erroneous
behaviour happens. I am just trying to reproduce
"plot_iris_exercise.py" with
iris.scale(
http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/iris.scale
).
$python2.7 linsvm.py iris.scale
Accuracy 0.973333333333
File "linsvm.py", line 28, in <module>
n_sample = len(X)
File
"/usr/lib/python2.7/dist-packages/scipy/sparse/base.py",
line 175, in __len__
raise TypeError("sparse matrix length is ambiguous;
use getnnz()"
TypeError: sparse matrix length is ambiguous; use
getnnz()
or shape[0]
import numpy as np
import pylab as pl
from sklearn import datasets, svm
from sklearn.datasets import load_svmlight_file
import sys
#iris = datasets.load_iris()
#X = iris.data
#y = iris.target
##import iris.scale dataset in LibSVM
X_in, y_in=load_svmlight_file(sys.argv[1])
X_train=X_in
y_train=y_in
X_test=X_in
y_test=y_in
svc=svm.SVC(kernel='linear')
clf=svc.fit(X_train, y_train)
y_pred =clf.predict(X_test)
print "Accuracy", np.mean(y_pred == y_test)
X = X_in[y_in != 0, :2]
y = y_in[y_in != 0]
##The following code is from plot_iris_exercise.py
n_sample = len(X)
np.random.seed(0)
order = np.random.permutation(n_sample)
X = X[order]
y = y[order].astype(np.float)
X_train = X[:.9 * n_sample]
y_train = y[:.9 * n_sample]
X_test = X[.9 * n_sample:]
y_test = y[.9 * n_sample:]
# fit the model
for fig_num, kernel in enumerate(('linear', 'rbf',
clf = svm.SVC(kernel=kernel, gamma=10)
clf.fit(X_train, y_train)
pl.figure(fig_num)
pl.clf()
pl.scatter(X[:, 0], X[:, 1], c=y, zorder=10,
cmap=pl.cm.Paired)
# Circle out the test data
pl.scatter(X_test[:, 0], X_test[:, 1], s=80,
facecolors='none', zorder=10)
pl.axis('tight')
x_min = X[:, 0].min()
x_max = X[:, 0].max()
y_min = X[:, 1].min()
y_max = X[:, 1].max()
XX, YY = np.mgrid[x_min:x_max:200j,
y_min:y_max:200j]
Z = clf.decision_function(np.c_[XX.ravel(),
YY.ravel()])
# Put the result into a color plot
Z = Z.reshape(XX.shape)
pl.pcolormesh(XX, YY, Z > 0, cmap=pl.cm.Paired)
pl.contour(XX, YY, Z, colors=['k', 'k', 'k'],
linestyles=['--', '-', '--'],
levels=[-.5, 0, .5])
pl.title(kernel)
pl.show()
Many thanks,
Hakan
------------------------------------------------------------------------------
See everything from the browser to the database with
AppDynamics
Get end-to-end visibility with application monitoring
from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2013-07-12 15:00:58 UTC
Permalink
Hi.
If you just want the iris dataset, you can get it using
"datasets.load_iris()" (and scale it with StandardScaler).
The problem in your code is that load_svmlight_file returns X as a
sparse matrix.
You need to convert it to an nd-array if you want to use the example
using X.toarray().
(I think it is only needed for the n_sample line, but I'm not entirely
sure about the slicing).

Hth,
Andy
Post by Hueseyin Hakan Pekmezci
Hi scikit-learn members,
0.13.1 documentation states that individual datasets can
be loaded in svmlight / libsvm format. So I have fed in
"iris.scale" libSVM dataset however some erroneous
behaviour happens. I am just trying to reproduce
"plot_iris_exercise.py" with
iris.scale(http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/iris.scale).
$python2.7 linsvm.py iris.scale
Accuracy 0.973333333333
File "linsvm.py", line 28, in <module>
n_sample = len(X)
File
"/usr/lib/python2.7/dist-packages/scipy/sparse/base.py",
line 175, in __len__
raise TypeError("sparse matrix length is ambiguous;
use getnnz()"
TypeError: sparse matrix length is ambiguous; use getnnz()
or shape[0]
import numpy as np
import pylab as pl
from sklearn import datasets, svm
from sklearn.datasets import load_svmlight_file
import sys
#iris = datasets.load_iris()
#X = iris.data
#y = iris.target
##import iris.scale dataset in LibSVM
X_in, y_in=load_svmlight_file(sys.argv[1])
X_train=X_in
y_train=y_in
X_test=X_in
y_test=y_in
svc=svm.SVC(kernel='linear')
clf=svc.fit(X_train, y_train)
y_pred =clf.predict(X_test)
print "Accuracy", np.mean(y_pred == y_test)
X = X_in[y_in != 0, :2]
y = y_in[y_in != 0]
##The following code is from plot_iris_exercise.py
n_sample = len(X)
np.random.seed(0)
order = np.random.permutation(n_sample)
X = X[order]
y = y[order].astype(np.float)
X_train = X[:.9 * n_sample]
y_train = y[:.9 * n_sample]
X_test = X[.9 * n_sample:]
y_test = y[.9 * n_sample:]
# fit the model
for fig_num, kernel in enumerate(('linear', 'rbf',
clf = svm.SVC(kernel=kernel, gamma=10)
clf.fit(X_train, y_train)
pl.figure(fig_num)
pl.clf()
pl.scatter(X[:, 0], X[:, 1], c=y, zorder=10,
cmap=pl.cm.Paired)
# Circle out the test data
pl.scatter(X_test[:, 0], X_test[:, 1], s=80,
facecolors='none', zorder=10)
pl.axis('tight')
x_min = X[:, 0].min()
x_max = X[:, 0].max()
y_min = X[:, 1].min()
y_max = X[:, 1].max()
XX, YY = np.mgrid[x_min:x_max:200j, y_min:y_max:200j]
Z = clf.decision_function(np.c_[XX.ravel(),
YY.ravel()])
# Put the result into a color plot
Z = Z.reshape(XX.shape)
pl.pcolormesh(XX, YY, Z > 0, cmap=pl.cm.Paired)
pl.contour(XX, YY, Z, colors=['k', 'k', 'k'],
linestyles=['--', '-', '--'],
levels=[-.5, 0, .5])
pl.title(kernel)
pl.show()
Many thanks,
Hakan
------------------------------------------------------------------------------
See everything from the browser to the database with AppDynamics
Get end-to-end visibility with application monitoring from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Hakan
2013-07-12 15:14:19 UTC
Permalink
as you see initially I was loading the iris data exactly
like example. But being able to work for individual
datasets, I needed to give it a libSVM try. Is there any
piece of code, example to point out its smooth integration
with scikit-learn? I mean some svm classifier example with
svmlight_load.

Thanks,
Hakan

On Fri, 12 Jul 2013 17:00:58 +0200
Post by Andreas Mueller
Hi.
If you just want the iris dataset, you can get it using
"datasets.load_iris()" (and scale it with
StandardScaler).
The problem in your code is that load_svmlight_file
returns X as a
sparse matrix.
You need to convert it to an nd-array if you want to use
the example
using X.toarray().
(I think it is only needed for the n_sample line, but
I'm not entirely
sure about the slicing).
Hth,
Andy
Post by Hueseyin Hakan Pekmezci
Hi scikit-learn members,
0.13.1 documentation states that individual datasets can
be loaded in svmlight / libsvm format. So I have fed in
"iris.scale" libSVM dataset however some erroneous
behaviour happens. I am just trying to reproduce
"plot_iris_exercise.py" with
iris.scale(http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/iris.scale).
$python2.7 linsvm.py iris.scale
Accuracy 0.973333333333
File "linsvm.py", line 28, in <module>
n_sample = len(X)
File
"/usr/lib/python2.7/dist-packages/scipy/sparse/base.py",
line 175, in __len__
raise TypeError("sparse matrix length is
ambiguous;
use getnnz()"
TypeError: sparse matrix length is ambiguous; use
getnnz()
or shape[0]
import numpy as np
import pylab as pl
from sklearn import datasets, svm
from sklearn.datasets import load_svmlight_file
import sys
#iris = datasets.load_iris()
#X = iris.data
#y = iris.target
##import iris.scale dataset in LibSVM
X_in, y_in=load_svmlight_file(sys.argv[1])
X_train=X_in
y_train=y_in
X_test=X_in
y_test=y_in
svc=svm.SVC(kernel='linear')
clf=svc.fit(X_train, y_train)
y_pred =clf.predict(X_test)
print "Accuracy", np.mean(y_pred == y_test)
X = X_in[y_in != 0, :2]
y = y_in[y_in != 0]
##The following code is from plot_iris_exercise.py
n_sample = len(X)
np.random.seed(0)
order = np.random.permutation(n_sample)
X = X[order]
y = y[order].astype(np.float)
X_train = X[:.9 * n_sample]
y_train = y[:.9 * n_sample]
X_test = X[.9 * n_sample:]
y_test = y[.9 * n_sample:]
# fit the model
for fig_num, kernel in enumerate(('linear', 'rbf',
clf = svm.SVC(kernel=kernel, gamma=10)
clf.fit(X_train, y_train)
pl.figure(fig_num)
pl.clf()
pl.scatter(X[:, 0], X[:, 1], c=y, zorder=10,
cmap=pl.cm.Paired)
# Circle out the test data
pl.scatter(X_test[:, 0], X_test[:, 1], s=80,
facecolors='none', zorder=10)
pl.axis('tight')
x_min = X[:, 0].min()
x_max = X[:, 0].max()
y_min = X[:, 1].min()
y_max = X[:, 1].max()
XX, YY = np.mgrid[x_min:x_max:200j,
y_min:y_max:200j]
Z = clf.decision_function(np.c_[XX.ravel(),
YY.ravel()])
# Put the result into a color plot
Z = Z.reshape(XX.shape)
pl.pcolormesh(XX, YY, Z > 0, cmap=pl.cm.Paired)
pl.contour(XX, YY, Z, colors=['k', 'k', 'k'],
linestyles=['--', '-', '--'],
levels=[-.5, 0, .5])
pl.title(kernel)
pl.show()
Many thanks,
Hakan
------------------------------------------------------------------------------
See everything from the browser to the database with
AppDynamics
Get end-to-end visibility with application monitoring
from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
See everything from the browser to the database with
AppDynamics
Get end-to-end visibility with application monitoring
from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2013-07-12 15:44:48 UTC
Permalink
Post by Hakan
as you see initially I was loading the iris data exactly
like example. But being able to work for individual
datasets, I needed to give it a libSVM try. Is there any
piece of code, example to point out its smooth integration
with scikit-learn? I mean some svm classifier example with
svmlight_load.
You can use any example if you just use
X = X.toarray()
after the loading.
Hakan
2013-07-12 16:06:33 UTC
Permalink
Unfortunately it's not pretty straight forward as you
said... I have made the changes Mathieu and you mentioned
but loading the feature set into an array "X=X.toarray()"
doesn't respond immediately to run any example with libsvm
datasets.
Please have a look the following code...decision boundry
line is complaining about shape:
File "linsvm.py", line 68, in <module>
Z = Z.reshape(XX.shape)#"total size of new array must
be unchanged"error
ValueError: total size of new array must be unchanged

code:

import numpy as np
import pylab as pl
from sklearn import datasets, svm
from sklearn.datasets import load_svmlight_file
import sys

##import iris.scale dataset in LibSVM
X_in, y_in=load_svmlight_file(sys.argv[1])

X_train=X_in
y_train=y_in
X_test=X_in
y_test=y_in
print "before X_in:",X_in

X = X_in[:150, :2]
y = y_in[:150]

##The following code is from plot_iris_exercise.py
#n_sample = len(X)
print "before X:",X
X = X.toarray()
print "after X:",X
n_sample = len(X)
print n_sample

np.random.seed(0)
order = np.random.permutation(n_sample)
X = X[order]
y = y[order].astype(np.float)

X_train = X[:.9 * n_sample]
y_train = y[:.9 * n_sample]
X_test = X[.9 * n_sample:]
y_test = y[.9 * n_sample:]

# fit the model
for fig_num, kernel in enumerate(('linear', 'rbf',
'poly')):
clf = svm.SVC(kernel=kernel, gamma=10)
clf.fit(X_train, y_train)

pl.figure(fig_num)
pl.clf()
pl.scatter(X[:, 0], X[:, 1], c=y, zorder=10,
cmap=pl.cm.Paired)

# Circle out the test data
pl.scatter(X_test[:, 0], X_test[:, 1], s=80,
facecolors='none', zorder=10)

pl.axis('tight')
x_min = X[:, 0].min()
x_max = X[:, 0].max()
y_min = X[:, 1].min()
y_max = X[:, 1].max()

XX, YY = np.mgrid[x_min:x_max:200j, y_min:y_max:200j]
print "XX:",XX
Z = clf.decision_function(np.c_[XX.ravel(),
YY.ravel()])
print "Z:",Z
# Put the result into a color plot
Z = Z.reshape(XX.shape)#"total size of new array must
be unchanged"error
pl.pcolormesh(XX, YY, Z > 0, cmap=pl.cm.Paired)
pl.contour(XX, YY, Z, colors=['k', 'k', 'k'],
linestyles=['--', '-', '--'],
levels=[-.5, 0, .5])

pl.title(kernel)
pl.show()
On Fri, 12 Jul 2013 17:44:48 +0200
Post by Andreas Mueller
Post by Hakan
as you see initially I was loading the iris data exactly
like example. But being able to work for individual
datasets, I needed to give it a libSVM try. Is there any
piece of code, example to point out its smooth
integration
with scikit-learn? I mean some svm classifier example
with
svmlight_load.
You can use any example if you just use
X = X.toarray()
after the loading.
------------------------------------------------------------------------------
See everything from the browser to the database with
AppDynamics
Get end-to-end visibility with application monitoring
from AppDynamics
Isolate bottlenecks and diagnose root cause in seconds.
Start your free trial of AppDynamics Pro today!
http://pubads.g.doubleclick.net/gampad/clk?id=48808831&iu=/4140/ostg.clktrk
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Olivier Grisel
2013-07-12 16:32:11 UTC
Permalink
Post by Hakan
Unfortunately it's not pretty straight forward as you
said...
The error message was:

TypeError: sparse matrix length is ambiguous; use getnnz() or shape[0]

It is completely straightforward. It says that the object you are
dealing with a sparse matrix as written in the documentation:

http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_svmlight_file.html

Hence examples that use numpy arrays cannot be copied and pasted blindly.
Post by Hakan
I have made the changes Mathieu and you mentioned
but loading the feature set into an array "X=X.toarray()"
doesn't respond immediately to run any example with libsvm
datasets.
Please have a look the following code...decision boundry
File "linsvm.py", line 68, in <module>
Z = Z.reshape(XX.shape)#"total size of new array must
be unchanged"error
ValueError: total size of new array must be unchanged
Well that is a completely unrelated error not caused by the
libsvm/svmlight format.

I don't really know how numpy.mgrid works, you should read the
documentation to understand what wrong:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.mgrid.html

Also print the shape the shape of Z, XX X both in the original example
and in your modified script so as to understand what's going on.

--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel

Olivier Grisel
2013-07-12 15:59:29 UTC
Permalink
Post by Hueseyin Hakan Pekmezci
Hi scikit-learn members,
0.13.1 documentation states that individual datasets can
be loaded in svmlight / libsvm format. So I have fed in
"iris.scale" libSVM dataset however some erroneous
behaviour happens. I am just trying to reproduce
"plot_iris_exercise.py" with
iris.scale(http://www.csie.ntu.edu.tw/~cjlin/libsvmtools/datasets/multiclass/iris.scale).
$python2.7 linsvm.py iris.scale
Accuracy 0.973333333333
File "linsvm.py", line 28, in <module>
n_sample = len(X)
File
"/usr/lib/python2.7/dist-packages/scipy/sparse/base.py",
line 175, in __len__
raise TypeError("sparse matrix length is ambiguous;
use getnnz()"
TypeError: sparse matrix length is ambiguous; use getnnz()
or shape[0]
import numpy as np
import pylab as pl
from sklearn import datasets, svm
from sklearn.datasets import load_svmlight_file
import sys
#iris = datasets.load_iris()
#X = iris.data
#y = iris.target
##import iris.scale dataset in LibSVM
X_in, y_in=load_svmlight_file(sys.argv[1])
X_train=X_in
y_train=y_in
X_test=X_in
y_test=y_in
This is a methodological mistake: you should never use the same data
for training and testing a model. Instead use:

from sklearn.cross_validation import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.25, random_state=0)

random_state is a seed integer to control the randomness of the split
and make it reproducible across run (easier for debugging).
Post by Hueseyin Hakan Pekmezci
svc=svm.SVC(kernel='linear')
clf=svc.fit(X_train, y_train)
y_pred =clf.predict(X_test)
print "Accuracy", np.mean(y_pred == y_test)
X = X_in[y_in != 0, :2]
y = y_in[y_in != 0]
???
Post by Hueseyin Hakan Pekmezci
##The following code is from plot_iris_exercise.py
n_sample = len(X)
Don't do that on a sparse matrix. Either do:

n_samples = X.shape(0)

or

X = X.toarray()
n_samples = len(X)

--
Olivier
Hakan
2013-07-12 16:15:23 UTC
Permalink
On Fri, 12 Jul 2013 17:59:29 +0200
Post by Olivier Grisel
Post by Hueseyin Hakan Pekmezci
X_train=X_in
y_train=y_in
X_test=X_in
y_test=y_in
This is a methodological mistake: you should never use
the same data
from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X,
y,
test_size=0.25, random_state=0)
Thanks Olivier reminding that.. It was just a trial run
which I missed that point. Promptly I have applied now.
Post by Olivier Grisel
Post by Hueseyin Hakan Pekmezci
X = X_in[y_in != 0, :2]
y = y_in[y_in != 0]
I fixed that point to 150 samples each.
Post by Olivier Grisel
Post by Hueseyin Hakan Pekmezci
n_sample = len(X)
n_samples = X.shape(0)
or
X = X.toarray()
n_samples = len(X)
Now it has got this changes but decision boundary lines
require some alterations due to shape...

Cheers,
Hakan
Continue reading on narkive:
Loading...