Lars Buitinck
2011-06-01 15:35:35 UTC
Hi all,
I'm not sure I'm even supposed to try this, but I did it anyway:
pipe = Pipeline([
('vect', CountVectorizer()),
('bin', Binarizer()),
('clf', BernoulliNB()),
])
should, I thought, count term occurrences and then transform them to
binary features to be used in a Bernoulli naive Bayes classifier.[1]
However, fitting this pipeline fails:
Traceback (most recent call last):
File "examples/bernoulli_naive_bayes.py", line 34, in <module>
bnb.fit(docs_train, data_train.target)
File "/scratch/apps/src/scikit-learn/scikits/learn/pipeline.py",
line 141, in fit
Xt = self._pre_transform(X, y, **params)
File "/scratch/apps/src/scikit-learn/scikits/learn/pipeline.py",
line 137, in _pre_transform
Xt = transform.fit(Xt, y).transform(Xt)
File "/scratch/apps/src/scikit-learn/scikits/learn/preprocessing/__init__.py",
line 125, in transform
X[cond] = 1
TypeError: 'coo_matrix' object does not support item assignment
So the question is: is this a bug in Binarizer, is this a bug in
CountVectorizer or did I do something immoral/illegal/invalid?
Regards,
Lars
[1] https://github.com/larsmans/scikit-learn/commit/5f87e43cb462d4df4b982254746b4ce3dc79a1b4
I'm not sure I'm even supposed to try this, but I did it anyway:
pipe = Pipeline([
('vect', CountVectorizer()),
('bin', Binarizer()),
('clf', BernoulliNB()),
])
should, I thought, count term occurrences and then transform them to
binary features to be used in a Bernoulli naive Bayes classifier.[1]
However, fitting this pipeline fails:
Traceback (most recent call last):
File "examples/bernoulli_naive_bayes.py", line 34, in <module>
bnb.fit(docs_train, data_train.target)
File "/scratch/apps/src/scikit-learn/scikits/learn/pipeline.py",
line 141, in fit
Xt = self._pre_transform(X, y, **params)
File "/scratch/apps/src/scikit-learn/scikits/learn/pipeline.py",
line 137, in _pre_transform
Xt = transform.fit(Xt, y).transform(Xt)
File "/scratch/apps/src/scikit-learn/scikits/learn/preprocessing/__init__.py",
line 125, in transform
X[cond] = 1
TypeError: 'coo_matrix' object does not support item assignment
So the question is: is this a bug in Binarizer, is this a bug in
CountVectorizer or did I do something immoral/illegal/invalid?
Regards,
Lars
[1] https://github.com/larsmans/scikit-learn/commit/5f87e43cb462d4df4b982254746b4ce3dc79a1b4
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam