Tom Kenter
2013-10-04 14:42:15 UTC
Dear all,
I am trying to run a linear_model.SGDClassifier() and have it update after
every example it classifies.
My code works for a small feature file (10 features), but when I give it a
bigger feature file (some 80000 features, but very sparse) it keeps giving
me errors straight away, the first time partial_fit() is called.
This is what I do in pseudocode:
X, y = load_svmlight_file(train_file)
classifier = linear_model.SGDClassifier()
classifier.fit(X, y)
for every test_line in test file:
test_X, test_y = getFeatures(test_line)
# This gives me a Python list for X
# and an integer label for y
print "prediction: %f" % = classifier.predict([test_X])
classifier.partial_fit(csr_matrix([test_X]),
csr_matrix([Y_GroundTruth])
classes=np.unique(y) )
The error I keep getting for the partial_fit() line is:
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
line 487, in partial_fit
coef_init=None, intercept_init=None)
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
line 371, in _partial_fit
sample_weight=sample_weight, n_iter=n_iter)
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
line 451, in _fit_multiclass
for i in range(len(self.classes_)))
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
line 517, in __call__
self.dispatch(function, args, kwargs)
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
line 312, in dispatch
job = ImmediateApply(func, args, kwargs)
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
line 136, in __init__
self.results = func(*args, **kwargs)
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
line 284, in fit_binary
est.power_t, est.t_, intercept_decay)
File "sgd_fast.pyx", line 327, in sklearn.linear_model.sgd_fast.plain_sgd
(sklearn/linear_model/sgd_fast.c:7568)
ValueError: ndarray is not C-contiguous
I also tried feeding partial.fit() Python arrays, or numpy arrays (which
are C-contiguous (sort=C) by default, I thought), but this gives the same
result.
The classes attribute is not the problem I think. The same error appears if
I leave it out or if I give the right classes in hard code.
I do notice that when I print the flags of the _coef array of the
classifier, it says:
Flags of coef_ array:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
I am sure I am doing something wrong, but really, I don't see what...
Any help appreciated!
Cheers,
Tom
I am trying to run a linear_model.SGDClassifier() and have it update after
every example it classifies.
My code works for a small feature file (10 features), but when I give it a
bigger feature file (some 80000 features, but very sparse) it keeps giving
me errors straight away, the first time partial_fit() is called.
This is what I do in pseudocode:
X, y = load_svmlight_file(train_file)
classifier = linear_model.SGDClassifier()
classifier.fit(X, y)
for every test_line in test file:
test_X, test_y = getFeatures(test_line)
# This gives me a Python list for X
# and an integer label for y
print "prediction: %f" % = classifier.predict([test_X])
classifier.partial_fit(csr_matrix([test_X]),
csr_matrix([Y_GroundTruth])
classes=np.unique(y) )
The error I keep getting for the partial_fit() line is:
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
line 487, in partial_fit
coef_init=None, intercept_init=None)
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
line 371, in _partial_fit
sample_weight=sample_weight, n_iter=n_iter)
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
line 451, in _fit_multiclass
for i in range(len(self.classes_)))
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
line 517, in __call__
self.dispatch(function, args, kwargs)
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
line 312, in dispatch
job = ImmediateApply(func, args, kwargs)
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/externals/joblib/parallel.py",
line 136, in __init__
self.results = func(*args, **kwargs)
File
"/datastore/tkenter1/epd/lib/python2.7/site-packages/sklearn/linear_model/stochastic_gradient.py",
line 284, in fit_binary
est.power_t, est.t_, intercept_decay)
File "sgd_fast.pyx", line 327, in sklearn.linear_model.sgd_fast.plain_sgd
(sklearn/linear_model/sgd_fast.c:7568)
ValueError: ndarray is not C-contiguous
I also tried feeding partial.fit() Python arrays, or numpy arrays (which
are C-contiguous (sort=C) by default, I thought), but this gives the same
result.
The classes attribute is not the problem I think. The same error appears if
I leave it out or if I give the right classes in hard code.
I do notice that when I print the flags of the _coef array of the
classifier, it says:
Flags of coef_ array:
C_CONTIGUOUS : False
F_CONTIGUOUS : True
OWNDATA : True
WRITEABLE : True
ALIGNED : True
UPDATEIFCOPY : False
I am sure I am doing something wrong, but really, I don't see what...
Any help appreciated!
Cheers,
Tom