Discussion:
Perceptrons for scikit-learn
(too old to reply)
Lars Buitinck
2011-05-02 12:40:31 UTC
Permalink
Hi,

I have the intention of adding perceptrons to scikit-learn and was
urged by Gael to first discuss the addition on this mailing list
because of plans to add online learning. So, here goes.

I found a very clean, numpy-based library for (averaged) perceptrons
at http://code.google.com/p/python-perceptron/ and thought this would
make a good candidate for scikit-learn after some cleanup. I contacted
the author about this and he said he wasn't maintaining it anymore and
that incorporation into scikit-learn would be the best thing. This
library seems to do 1-vs-all classification when facing >2 classes.
I'm currently rewriting parts of it to conform with the scikit-learn
interface.

Now, as for online learning: this is not my prime interest, but
python-perceptron does already have such an interface: labeled
examples must be fed to it one by one.

I'm currently modifying it to do have a fit method that does multiple
iterations over its dataset, as recommended by Freund and Schapire in
*Large margin classification using the perceptron algorithm*. I can of
course stipulate in the interface that classification and prediction
may be interleaved. I can also add a method that learns from a single
instance, but I don't know what to call that; I was thinking of
'update', but maybe someone else has a better suggestion. (Matthieu
Blondel mentioned partial_fit?)

Regards,
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
Olivier Grisel
2011-05-02 13:36:16 UTC
Permalink
Post by Lars Buitinck
Hi,
I have the intention of adding perceptrons to scikit-learn and was
urged by Gael to first discuss the addition on this mailing list
because of plans to add online learning. So, here goes.
I found a very clean, numpy-based library for (averaged) perceptrons
at http://code.google.com/p/python-perceptron/ and thought this would
make a good candidate for scikit-learn after some cleanup. I contacted
the author about this and he said he wasn't maintaining it anymore and
that incorporation into scikit-learn would be the best thing. This
library seems to do 1-vs-all classification when facing >2 classes.
I'm currently rewriting parts of it to conform with the scikit-learn
interface.
Interesting. The sparse variant is based on python dict. I wonder if
it's possible to use a scipy.sparse datastructure instead. It is
probably much faster to compute the dot products with scipy but it
might not be as efficient as dict for pruning. You should checkout the
cython code of the sparse SGDClassifier which must be very similar
(though does not have the weights history / averaging part).
Post by Lars Buitinck
Now, as for online learning: this is not my prime interest, but
python-perceptron does already have such an interface: labeled
examples must be fed to it one by one.
I'm currently modifying it to do have a fit method that does multiple
iterations over its dataset, as recommended by Freund and Schapire in
*Large margin classification using the perceptron algorithm*. I can of
course stipulate in the interface that classification and prediction
may be interleaved. I can also add a method that learns from a single
instance, but I don't know what to call that; I was thinking of
'update', but maybe someone else has a better suggestion. (Matthieu
Blondel mentioned partial_fit?)
Yes partial_fit can be use to update the model incrementally on a
mini-batch (a slice of the dataset). Making slices of single examples
for real online learning would be possible in theory but a performance
killer in practice as the python function call overhead is far from
negligible.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Lars Buitinck
2011-05-03 11:34:58 UTC
Permalink
Post by Olivier Grisel
Interesting. The sparse variant is based on python dict. I wonder if
it's possible to use a scipy.sparse datastructure instead. It is
probably much faster to compute the dot products with scipy but it
might not be as efficient as dict for pruning. You should checkout the
cython code of the sparse SGDClassifier which must be very similar
(though does not have the weights history / averaging part).
Olivier, which matrix format do you recommend for this? CSR, like SGDClassifier?
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
Olivier Grisel
2011-05-03 11:48:42 UTC
Permalink
Post by Lars Buitinck
Post by Olivier Grisel
Interesting. The sparse variant is based on python dict. I wonder if
it's possible to use a scipy.sparse datastructure instead. It is
probably much faster to compute the dot products with scipy but it
might not be as efficient as dict for pruning. You should checkout the
cython code of the sparse SGDClassifier which must be very similar
(though does not have the weights history / averaging part).
Olivier, which matrix format do you recommend for this? CSR, like SGDClassifier?
CRS is probably the easiest and most efficient representation to
expect for the input the input data. The internal weights of the model
are probably better in a dense numpy array representation (as in
SGDClassifier).
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Mathieu Blondel
2011-05-02 14:02:58 UTC
Permalink
Hi Lars,
Post by Lars Buitinck
I found a very clean, numpy-based library for (averaged) perceptrons
at http://code.google.com/p/python-perceptron/ and thought this would
make a good candidate for scikit-learn after some cleanup. I contacted
For the sparse case, a Cython version would probably be much faster.
Post by Lars Buitinck
the author about this and he said he wasn't maintaining it anymore and
that incorporation into scikit-learn would be the best thing. This
library seems to do 1-vs-all classification when facing >2 classes.
I'm currently rewriting parts of it to conform with the scikit-learn
interface.
I would also add a multiclass=True|False option to the constructor and
if set to True, use the multiclass Perceptron instead of one-vs-all
(when a mistake is made add x to the correct class and subtract x to
wrongly predicted class).
Post by Lars Buitinck
Now, as for online learning: this is not my prime interest, but
python-perceptron does already have such an interface: labeled
examples must be fed to it one by one.
In scikit-learn, our priority is to use online learning for large
scale learning (i.e. when the data don't fit in memory), not for the
pure online setting.
Post by Lars Buitinck
I'm currently modifying it to do have a fit method that does multiple
iterations over its dataset, as recommended by Freund and Schapire in
*Large margin classification using the perceptron algorithm*. I can of
Yes making multiple passes over the dataset is definitely a good idea
(in this case, shuffling the data is important, that should be an
option which defaults to True).

Other schemes are possible like sampling with / without replacement,
balanced or not (sample positive examples with the same frequency as
negative examples).
Post by Lars Buitinck
course stipulate in the interface that classification and prediction
may be interleaved. I can also add a method that learns from a single
instance, but I don't know what to call that; I was thinking of
'update', but maybe someone else has a better suggestion. (Matthieu
Blondel mentioned partial_fit?)
partial_fit(X, y) has been mentioned in the past as a potential
candidate method for online learning (read large-scale learning). Like
for fit, X is a matrix and y is a vector except that partial_fit can
be called multiple times without overwriting the previous model
parameters.

In the multi-class case, there's the problem that y is not guaranteed
to contain all possible labels, so you can't just do np.unique(y) to
retrieve them. So we need to decide a way to pass it to the object
(argument to partial_fit? argument to the constructor?). Passing
data-dependent to the constructor seems wrong to me and making
partial_fit algorithm dependent is not so good either...

Another potential candidate method would be fit_dataset(dataset),
where dataset is an object that knows how to act like an iterator but
which has additional methods like reset() and unique_y(). Such a
dataset object could also know how to sample mini blocks of data. That
would be compatible with partial_fit.

For algorithms which have a learning rate (usually written as eta),
the learning rate needs to be decreased after each iteration and so
eta needs to be stored in the object. We could add a reset() method to
enable objects to reset it.

Online algorithms are usually inferior to batch algorithms so it seems
wise to me to use them in an online API (although a fit method can
also be provided for consistency with the rest of the scikit). If your
data fits in memory, why not use a LinearSVC directly?

See http://sourceforge.net/mailarchive/message.php?msg_id=27287618 for
the past discussion I mentioned above.

Mathieu
Mathieu Blondel
2011-05-02 14:13:23 UTC
Permalink
Post by Mathieu Blondel
Other schemes are possible like sampling with / without replacement,
balanced or not (sample positive examples with the same frequency as
negative examples).
Note that shuffling and making one pass is equivalent to sampling
without replacement.

Mathieu
Peter Prettenhofer
2011-05-02 16:22:14 UTC
Permalink
Hi Lars,

you might also want to take a look at the "Average Perceptron"
implementation in Bolt [1]. It is written in cython and implements the
multi-class case. The implementation in bolt, however, is not a pure
online learner - so for partial_fit you might store the weight vector
(and the second vector where you accumulate the average) as class
attributes instead of local variables.

best,
Peter

[1] https://github.com/pprett/bolt/blob/master/bolt/trainer/avgperceptron.pyx
Post by Mathieu Blondel
Post by Mathieu Blondel
Other schemes are possible like sampling with / without replacement,
balanced or not (sample positive examples with the same frequency as
negative examples).
Note that shuffling and making one pass is equivalent to sampling
without replacement.
Mathieu
------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
Lars Buitinck
2011-05-02 16:18:35 UTC
Permalink
For anyone who's interested in this: I pushed a first version to my
Github repo. It should be considered experimental.
Post by Mathieu Blondel
Post by Lars Buitinck
I found a very clean, numpy-based library for (averaged) perceptrons
at http://code.google.com/p/python-perceptron/ and thought this would
make a good candidate for scikit-learn after some cleanup. I contacted
For the sparse case, a Cython version would probably be much faster.
Very probably. Olivier's suggestion of using scipy.sparse may also be
a good idea, I'll try that first.
Post by Mathieu Blondel
Post by Lars Buitinck
the author about this and he said he wasn't maintaining it anymore and
that incorporation into scikit-learn would be the best thing. This
library seems to do 1-vs-all classification when facing >2 classes.
I'm currently rewriting parts of it to conform with the scikit-learn
interface.
I would also add a multiclass=True|False option to the constructor and
if set to True, use the multiclass Perceptron instead of one-vs-all
(when a mistake is made add x to the correct class and subtract x to
wrongly predicted class).
Put this in the TODO with the commit.
Post by Mathieu Blondel
Post by Lars Buitinck
Now, as for online learning: this is not my prime interest, but
python-perceptron does already have such an interface: labeled
examples must be fed to it one by one.
In scikit-learn, our priority is to use online learning for large
scale learning (i.e. when the data don't fit in memory), not for the
pure online setting.
Right. My current dataset does fit in memory, but I'll keep this in mind.
Post by Mathieu Blondel
Post by Lars Buitinck
course stipulate in the interface that classification and prediction
may be interleaved. I can also add a method that learns from a single
instance, but I don't know what to call that; I was thinking of
'update', but maybe someone else has a better suggestion. (Matthieu
Blondel mentioned partial_fit?)
partial_fit(X, y) has been mentioned in the past as a potential
candidate method for online learning (read large-scale learning). Like
for fit, X is a matrix and y is a vector except that partial_fit can
be called multiple times without overwriting the previous model
parameters.
In the multi-class case, there's the problem that y is not guaranteed
to contain all possible labels, so you can't just do np.unique(y) to
retrieve them. So we need to decide a way to pass it to the object
(argument to partial_fit? argument to the constructor?). Passing
data-dependent to the constructor seems wrong to me and making
partial_fit algorithm dependent is not so good either...
I'd say pass it to the constructor as a parameter, but my interest is
currently NLP labeling tasks where the set of labels is known
beforehand. I'm not sure if this is a good fit for other problem
settings.
Post by Mathieu Blondel
Another potential candidate method would be fit_dataset(dataset),
where dataset is an object that knows how to act like an iterator but
which has additional methods like reset() and unique_y(). Such a
dataset object could also know how to sample mini blocks of data. That
would be compatible with partial_fit.
I don't like this idea very much. It reminds me of the overengineered
ML libraries that I've fled before coming to scikit-learn.
Post by Mathieu Blondel
For algorithms which have a learning rate (usually written as eta),
the learning rate needs to be decreased after each iteration and so
eta needs to be stored in the object. We could add a reset() method to
enable objects to reset it.
Online algorithms are usually inferior to batch algorithms so it seems
wise to me to use them in an online API (although a fit method can
also be provided for consistency with the rest of the scikit). If your
data fits in memory, why not use a LinearSVC directly?
Because I want to port an application that currently uses averaged
perceptrons from Java to Python. The application performs *very* well
with averaged perceptrons, beating other folks' CRF/SVM/MaxEnt
solutions (probably not so much due to the perceptrons, but I'm sure
not of that yet). I want to stay as close as possible to the Java
version's behavior before I start changing things such as the central
learning algorithm.
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
Peter Prettenhofer
2011-05-02 16:30:23 UTC
Permalink
Post by Lars Buitinck
Because I want to port an application that currently uses averaged
perceptrons from Java to Python. The application performs *very* well
with averaged perceptrons, beating other folks' CRF/SVM/MaxEnt
solutions (probably not so much due to the perceptrons, but I'm sure
not of that yet). I want to stay as close as possible to the Java
version's behavior before I start changing things such as the central
learning algorithm.
Since you are doing NLP (and compare with CRFs): do you eventually
want to contribute a structured perceptron (with arbitrary decoders,
e.g. for sequence labeling, etc)? If so, I'd be definitely interested
but we have to decide whether or not that's too specialized for
scikit-learn...

best,
Peter
--
Peter Prettenhofer
Olivier Grisel
2011-05-02 17:04:42 UTC
Permalink
Post by Peter Prettenhofer
Post by Lars Buitinck
Because I want to port an application that currently uses averaged
perceptrons from Java to Python. The application performs *very* well
with averaged perceptrons, beating other folks' CRF/SVM/MaxEnt
solutions (probably not so much due to the perceptrons, but I'm sure
not of that yet). I want to stay as close as possible to the Java
version's behavior before I start changing things such as the central
learning algorithm.
Since you are doing NLP (and compare with CRFs): do you eventually
want to contribute a structured perceptron (with arbitrary decoders,
e.g. for sequence labeling, etc)? If so, I'd be definitely interested
but we have to decide whether or not that's too specialized for
scikit-learn...
I am not opposed to structured output models in the scikit but we
should make decision based on concrete use cases and try to gather a
couple of different use cases from several domains, e.g. sentence
segmentation (a.k.a. shallow parsing or chunking) in NLP, audio
segmentations, hierarchical topic modeling... so as to try to find
reusable yet intuitive python constructs to be exposed in the public
API.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Lars Buitinck
2011-05-02 18:10:10 UTC
Permalink
Post by Peter Prettenhofer
Since you are doing NLP (and compare with CRFs): do you eventually
want to contribute a structured perceptron (with arbitrary decoders,
e.g. for sequence labeling, etc)? If so, I'd be definitely interested
but we have to decide whether or not that's too specialized for
scikit-learn...
I've gotten a long way with just using local perceptron predictions
for doing sequence labeling, and my prime interest is getting my old
algorithm working again in Python so I can get on with what I was
doing, building a specialized NLP pipeline in Python. If you have
references that list pseudocode for structured perceptrons, I'd be
interested, but don't expect magic from me :)
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
xinfan meng
2011-05-03 00:31:17 UTC
Permalink
1. Michael Collins, “Discriminative training methods for hidden Markov
models: theory and experiments with perceptron algorithms,” in *Proceedings
of the ACL-02 conference on Empirical methods in natural language processing
- Volume 10*, EMNLP ’02 (Stroudsburg, PA, USA: Association for
Computational Linguistics, 2002), 1–8,
http://dx.doi.org/10.3115/1118693.1118694.
2. M. Collins, “Parameter estimation for statistical parsing models:
Theory and practice of distribution-free methods,” *New developments in
parsing technology* (2004): 19–55.
3. Ryan Mcdonald, *Course on Generalized Linear Classifiers*, n.d.,
http://www.ryanmcd.com/courses/gslt2007.html.
Post by Lars Buitinck
Post by Peter Prettenhofer
Since you are doing NLP (and compare with CRFs): do you eventually
want to contribute a structured perceptron (with arbitrary decoders,
e.g. for sequence labeling, etc)? If so, I'd be definitely interested
but we have to decide whether or not that's too specialized for
scikit-learn...
I've gotten a long way with just using local perceptron predictions
for doing sequence labeling, and my prime interest is getting my old
algorithm working again in Python so I can get on with what I was
doing, building a specialized NLP pipeline in Python. If you have
references that list pseudocode for structured perceptrons, I'd be
interested, but don't expect magic from me :)
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today. Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Best Wishes
--------------------------------------------
Meng Xinfan蒙新泛
Institute of Computational Linguistics
Department of Computer Science & Technology
School of Electronic Engineering & Computer Science
Peking University
Beijing, 100871
China
Lars Buitinck
2011-05-03 08:24:28 UTC
Permalink
theory and experiments with perceptron algorithms,” in Proceedings of the
ACL-02 conference on Empirical methods in natural language processing -
Volume 10, EMNLP ’02 (Stroudsburg, PA, USA: Association for Computational
Linguistics, 2002), 1–8, http://dx.doi.org/10.3115/1118693.1118694.
M. Collins, “Parameter estimation for statistical parsing models: Theory and
practice of distribution-free methods,” New developments in parsing
technology (2004): 19–55.
Ryan Mcdonald, Course on Generalized Linear Classifiers, n.d.,
http://www.ryanmcd.com/courses/gslt2007.html.
Great, will look into this!
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
xinfan meng
2011-05-03 01:48:22 UTC
Permalink
This seems related, though in C++.

http://code.google.com/p/oll/
Post by Lars Buitinck
Post by Peter Prettenhofer
Since you are doing NLP (and compare with CRFs): do you eventually
want to contribute a structured perceptron (with arbitrary decoders,
e.g. for sequence labeling, etc)? If so, I'd be definitely interested
but we have to decide whether or not that's too specialized for
scikit-learn...
I've gotten a long way with just using local perceptron predictions
for doing sequence labeling, and my prime interest is getting my old
algorithm working again in Python so I can get on with what I was
doing, building a specialized NLP pipeline in Python. If you have
references that list pseudocode for structured perceptrons, I'd be
interested, but don't expect magic from me :)
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today. Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Best Wishes
--------------------------------------------
Meng Xinfan蒙新泛
Institute of Computational Linguistics
Department of Computer Science & Technology
School of Electronic Engineering & Computer Science
Peking University
Beijing, 100871
China
Peter Prettenhofer
2011-05-03 06:27:24 UTC
Permalink
Post by Lars Buitinck
Post by Peter Prettenhofer
Since you are doing NLP (and compare with CRFs): do you eventually
want to contribute a structured perceptron (with arbitrary decoders,
e.g. for sequence labeling, etc)? If so, I'd be definitely interested
but we have to decide whether or not that's too specialized for
scikit-learn...
I've gotten a long way with just using local perceptron predictions
for doing sequence labeling, and my prime interest is getting my old
algorithm working again in Python so I can get on with what I was
doing, building a specialized NLP pipeline in Python. If you have
references that list pseudocode for structured perceptrons, I'd be
interested, but don't expect magic from me :)
Me too but I mostly do Named-Entity Recognition (there's not so much
information in the label sequence compared to POS tagging). I think
structured prediction methods such as CRFs or structured perceptrons
are too specialized for scikit-learn.

best,
Peter
Post by Lars Buitinck
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today.  Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
xinfan meng
2011-05-03 07:08:41 UTC
Permalink
On Tue, May 3, 2011 at 2:27 PM, Peter Prettenhofer <
Post by Peter Prettenhofer
Post by Lars Buitinck
Post by Peter Prettenhofer
Since you are doing NLP (and compare with CRFs): do you eventually
want to contribute a structured perceptron (with arbitrary decoders,
e.g. for sequence labeling, etc)? If so, I'd be definitely interested
but we have to decide whether or not that's too specialized for
scikit-learn...
I've gotten a long way with just using local perceptron predictions
for doing sequence labeling, and my prime interest is getting my old
algorithm working again in Python so I can get on with what I was
doing, building a specialized NLP pipeline in Python. If you have
references that list pseudocode for structured perceptrons, I'd be
interested, but don't expect magic from me :)
Me too but I mostly do Named-Entity Recognition (there's not so much
information in the label sequence compared to POS tagging). I think
structured prediction methods such as CRFs or structured perceptrons
are too specialized for scikit-learn.
But scikit-learn do have a HMM.
Post by Peter Prettenhofer
best,
Peter
Post by Lars Buitinck
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
------------------------------------------------------------------------------
Post by Lars Buitinck
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today. Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today. Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Best Wishes
--------------------------------------------
Meng Xinfan蒙新泛
Institute of Computational Linguistics
Department of Computer Science & Technology
School of Electronic Engineering & Computer Science
Peking University
Beijing, 100871
China
Peter Prettenhofer
2011-05-03 07:16:03 UTC
Permalink
Post by xinfan meng
Post by Peter Prettenhofer
[..]
Me too but I mostly do Named-Entity Recognition (there's not so much
information in the label sequence compared to POS tagging). I think
structured prediction methods such as CRFs or structured perceptrons
are too specialized for scikit-learn.
But scikit-learn do have a HMM.
Ok - that's a point - but implementing CRFs is a laborious task and
I'm not aware of any python implementations that we could integrate.
Furthermore, it has to be efficient because for most common chunking
tasks the training set is rather large - wrapping some existing C++
implementation would be an option. But still, I think there are other
things with higher priority (e.g. efficient random forests, boosting,
online-learning).

best,
Peter
--
Peter Prettenhofer
Gael Varoquaux
2011-05-03 09:52:05 UTC
Permalink
Post by Peter Prettenhofer
Ok - that's a point - but implementing CRFs is a laborious task and
I'm not aware of any python implementations that we could integrate.
I agree. I just don't think that CRFs are a good candidate for
implementation in the scikit for the reasons you list.
Post by Peter Prettenhofer
Furthermore, it has to be efficient because for most common chunking
tasks the training set is rather large - wrapping some existing C++
implementation would be an option.
It would be an option. Is there a well-maintained CRF library with the
right license?
Post by Peter Prettenhofer
But still, I think there are other things with higher priority (e.g.
efficient random forests, boosting, online-learning).
Things get done when someone steps up with high-quality code to do them.
That said, I agree with you that I will not devote personal time to CRFs
in the scikit in the near future. In terms of structured learning, my
priorities would go to group lasso before.

G
Olivier Grisel
2011-05-03 09:56:17 UTC
Permalink
Post by Gael Varoquaux
Post by Peter Prettenhofer
Ok - that's a point - but implementing CRFs is a laborious task and
I'm not aware of any python implementations that we could integrate.
I agree. I just don't think that CRFs are a good candidate for
implementation in the scikit for the reasons you list.
Post by Peter Prettenhofer
Furthermore, it has to be efficient because for most common chunking
tasks the training set is rather large - wrapping some existing C++
implementation would be an option.
It would be an option. Is there a well-maintained CRF library with the
right license?
Yes: http://www.chokkan.org/software/crfsuite/ probably the fastest,
un pure C with the "Right License" with optional sparse priors.
Post by Gael Varoquaux
Post by Peter Prettenhofer
But still, I think there are other things with higher priority (e.g.
efficient random forests, boosting, online-learning).
Things get done when someone steps up with high-quality code to do them.
That said, I agree with you that I will not devote personal time to CRFs
in the scikit in the near future. In terms of structured learning, my
priorities would go to group lasso before.
I won't neither. But if someone would like to add support for CRF
fitting to the scikit I would +1 for cython wrapping crfsuite.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2011-05-03 10:01:37 UTC
Permalink
Post by Olivier Grisel
Post by Gael Varoquaux
Post by Peter Prettenhofer
Furthermore, it has to be efficient because for most common chunking
tasks the training set is rather large - wrapping some existing C++
implementation would be an option.
It would be an option. Is there a well-maintained CRF library with the
right license?
Yes: http://www.chokkan.org/software/crfsuite/ probably the fastest,
un pure C with the "Right License" with optional sparse priors.
Very nice. It's probably a major task to do a correct wrapping. In
addition, I think that it should not be started before the API and
data structures are a bit more clear.

Do you want to add a ticket with this idea (clearly stating that it's a
major that will require a lengthy investment) so that we don't forget.

Gael
Olivier Grisel
2011-05-03 10:09:11 UTC
Permalink
Post by Gael Varoquaux
Post by Olivier Grisel
Post by Gael Varoquaux
Post by Peter Prettenhofer
Furthermore, it has to be efficient because for most common chunking
tasks the training set is rather large - wrapping some existing C++
implementation would be an option.
It would be an option. Is there a well-maintained CRF library with the
right license?
Yes: http://www.chokkan.org/software/crfsuite/ probably the fastest,
un pure C with the "Right License" with optional sparse priors.
Very nice. It's probably a major task to do a correct wrapping. In
addition, I think that it should not be started before the API and
data structures are a bit more clear.
Ok. Also the author is very reactive: when I first encountered this
lib I contacted the author to let him know about Leon Bottou SGD
experiments that were quite recent at the time and he implemented the
SGD variant + benchmarks right away.
Post by Gael Varoquaux
Do you want to add a ticket with this idea (clearly stating that it's a
major that will require a lengthy investment) so that we don't forget.
I'll do that.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Olivier Grisel
2011-05-03 10:14:38 UTC
Permalink
Post by Olivier Grisel
Post by Gael Varoquaux
Do you want to add a ticket with this idea (clearly stating that it's a
major that will require a lengthy investment) so that we don't forget.
I'll do that.
https://github.com/scikit-learn/scikit-learn/issues/145
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2011-05-03 11:20:33 UTC
Permalink
Post by Olivier Grisel
Post by Olivier Grisel
Post by Gael Varoquaux
Do you want to add a ticket with this idea (clearly stating that it's a
major that will require a lengthy investment) so that we don't forget.
I'll do that.
https://github.com/scikit-learn/scikit-learn/issues/145
Thanks heaps,

G
Gael Varoquaux
2011-05-03 09:48:42 UTC
Permalink
Post by Peter Prettenhofer
Since you are doing NLP (and compare with CRFs): do you eventually
want to contribute a structured perceptron (with arbitrary decoders,
e.g. for sequence labeling, etc)? If so, I'd be definitely interested
but we have to decide whether or not that's too specialized for
scikit-learn...
We have interest in structured learning too. I guess the decision of
whether this should go in the scikit or not will depend on the code: if
it is complicated and hard to maintain, than it shouldn't go in
(eventhought I am personnaly interested. If the compromise is on the
other side, I think that it is perfectly eligible.

G
Mathieu Blondel
2011-05-02 16:46:59 UTC
Permalink
Post by Lars Buitinck
I'd say pass it to the constructor as a parameter, but my interest is
currently NLP labeling tasks where the set of labels is known
beforehand. I'm not sure if this is a good fit for other problem
settings.
It would be nice if objects could have both a fit and partial_fit
method so I'd rather not pass it to the constructor...

Mathieu
Alexandre Passos
2011-05-02 16:50:33 UTC
Permalink
Post by Mathieu Blondel
Post by Lars Buitinck
I'd say pass it to the constructor as a parameter, but my interest is
currently NLP labeling tasks where the set of labels is known
beforehand. I'm not sure if this is a good fit for other problem
settings.
It would be nice if objects could have both a fit and partial_fit
method so I'd rather not pass it to the constructor...
How about a partial_setup() that has to be called prior to
partial_fit()? It could then be responsible for dataset-specific
information (number of classes, for example, maybe number of features
as well).
--
 - Alexandre
Olivier Grisel
2011-05-02 16:58:02 UTC
Permalink
Post by Alexandre Passos
Post by Mathieu Blondel
Post by Lars Buitinck
I'd say pass it to the constructor as a parameter, but my interest is
currently NLP labeling tasks where the set of labels is known
beforehand. I'm not sure if this is a good fit for other problem
settings.
It would be nice if objects could have both a fit and partial_fit
method so I'd rather not pass it to the constructor...
How about a partial_setup() that has to be called prior to
partial_fit()? It could then be responsible for dataset-specific
information (number of classes, for example, maybe number of features
as well).
I think it's ok to pass an optional argument "n_labels" or
"possible_labels" or "label_ids" to partial_fit itself. No need for
additional method. It will likely be used by the model only during the
first call to partial_fit.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Mathieu Blondel
2011-05-02 17:01:50 UTC
Permalink
Post by Alexandre Passos
How about a partial_setup() that has to be called prior to
partial_fit()? It could then be responsible for dataset-specific
information (number of classes, for example, maybe number of features
as well).
That sounds like a good alternative. And indeed, unless you use a
hashing vector, you need to know the number of features to allocate
your weight vectors.

Mathieu
Olivier Grisel
2011-05-02 17:07:24 UTC
Permalink
Post by Mathieu Blondel
Post by Alexandre Passos
How about a partial_setup() that has to be called prior to
partial_fit()? It could then be responsible for dataset-specific
information (number of classes, for example, maybe number of features
as well).
That sounds like a good alternative. And indeed, unless you use a
hashing vector, you need to know the number of features to allocate
your weight vectors.
If we use scipy sparse matrices we need the number of feature even before hand.

That said we should definitely work on a generic feature hashing tool :)
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Mathieu Blondel
2011-05-02 17:20:51 UTC
Permalink
Post by Olivier Grisel
If we use scipy sparse matrices we need the number of feature even before hand.
Yes, this makes constructing matrices incrementally harder (you need
to make two passes over your dataset).

Mathieu
Olivier Grisel
2011-05-02 17:39:46 UTC
Permalink
Post by Mathieu Blondel
Post by Olivier Grisel
If we use scipy sparse matrices we need the number of feature even before hand.
Yes, this makes constructing matrices incrementally harder (you need
to make two passes over your dataset).
I think this is reasonable in practice: we need a pure python tools
that scans the input data from disk formatted as svmlight / vowpal
wabbit format / pure text files in folders / JSON stream dump /
whatever in the first pass, while incrementally building a feature
vocabulary (python dict) + 3 python lists matching the coo_matrix
representation and then we use joblib serializer to save it (after
conversion to CSR) and cache it on disk. Using the joblib serializer
should allow fast reloading thanks to the memmaping magic.

Then we can use that large scale scipy sparse input for scikit-learn models.

This scheme does not allow for online learning but should address most
practical largescale problems that fit on a single machine.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Mathieu Blondel
2011-05-02 18:14:00 UTC
Permalink
Post by Olivier Grisel
I think this is reasonable in practice: we need a pure python tools
that scans the input data from disk formatted as svmlight / vowpal
wabbit format / pure text files in folders / JSON stream dump /
whatever in the first pass, while incrementally building a feature
vocabulary (python dict) + 3 python lists matching the coo_matrix
representation and then we use joblib serializer to save it (after
conversion to CSR) and cache it on disk. Using the joblib serializer
should allow fast reloading thanks to the memmaping magic.
Have you ever tried the memory mapped file function on a CSR matrix? I
wonder if it's possible to retrieve random rows from the matrix fast
(to create a mini batch).

Mathieu
Olivier Grisel
2011-05-02 20:14:19 UTC
Permalink
Post by Mathieu Blondel
Post by Olivier Grisel
I think this is reasonable in practice: we need a pure python tools
that scans the input data from disk formatted as svmlight / vowpal
wabbit format / pure text files in folders / JSON stream dump /
whatever in the first pass, while incrementally building a feature
vocabulary (python dict) + 3 python lists matching the coo_matrix
representation and then we use joblib serializer to save it (after
conversion to CSR) and cache it on disk. Using the joblib serializer
should allow fast reloading thanks to the memmaping magic.
Have you ever tried the memory mapped file function on a CSR matrix? I
wonder if it's possible to retrieve random rows from the matrix fast
(to create a mini batch).
I think that should work with CSR as the memory is contiguous and the
offsets predictable: one fetch of a mini-batch of contiguous rows is 3
seeks one in the indptr array to fetch the offsets and 2 inthe values
and indices arrays. If you fetches batches of rows sequentially in a
predictable way the OS might even do some prefetching of the the
memory automatically.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Mathieu Blondel
2011-05-02 20:49:57 UTC
Permalink
Post by Olivier Grisel
I think that should work with CSR as the memory is contiguous and the
offsets predictable: one fetch of a mini-batch of contiguous rows is 3
seeks one in the indptr array to fetch the offsets and 2 inthe values
and indices arrays. If you fetches batches of rows sequentially in a
predictable way the OS might even do some prefetching of the the
memory automatically.
I have the same reasoning, I hope this will follow in practice.

It may be be possible to create the sampled matrix subsets by keeping
the data and indices fields the same and changing indptr only. This
way, this should reduce data copying.

Mathieu
Gael Varoquaux
2011-05-03 09:43:38 UTC
Permalink
Post by Olivier Grisel
I think that should work with CSR as the memory is contiguous and the
offsets predictable: one fetch of a mini-batch of contiguous rows is 3
seeks one in the indptr array to fetch the offsets and 2 inthe values
and indices arrays. If you fetches batches of rows sequentially in a
predictable way the OS might even do some prefetching of the the
memory automatically.
+1

G
xinfan meng
2011-05-03 00:33:04 UTC
Permalink
Post by Olivier Grisel
Post by Mathieu Blondel
Post by Olivier Grisel
If we use scipy sparse matrices we need the number of feature even
before hand.
Post by Mathieu Blondel
Yes, this makes constructing matrices incrementally harder (you need
to make two passes over your dataset).
I think this is reasonable in practice: we need a pure python tools
that scans the input data from disk formatted as svmlight / vowpal
wabbit format / pure text files in folders / JSON stream dump /
whatever in the first pass, while incrementally building a feature
vocabulary (python dict) + 3 python lists matching the coo_matrix
representation and then we use joblib serializer to save it (after
conversion to CSR) and cache it on disk. Using the joblib serializer
should allow fast reloading thanks to the memmaping magic.
I am interested in this procedure. Is there an example in scikits.learn?
Thanks.
Post by Olivier Grisel
Then we can use that large scale scipy sparse input for scikit-learn models.
This scheme does not allow for online learning but should address most
practical largescale problems that fit on a single machine.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
WhatsUp Gold - Download Free Network Management Software
The most intuitive, comprehensive, and cost-effective network
management toolset available today. Delivers lowest initial
acquisition cost and overall TCO of any competing solution.
http://p.sf.net/sfu/whatsupgold-sd
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Best Wishes
--------------------------------------------
Meng Xinfan蒙新泛
Institute of Computational Linguistics
Department of Computer Science & Technology
School of Electronic Engineering & Computer Science
Peking University
Beijing, 100871
China
Gael Varoquaux
2011-05-03 09:46:32 UTC
Permalink
Post by Olivier Grisel
I think this is reasonable in practice: we need a pure python tools
that scans the input data from disk formatted as svmlight / vowpal
wabbit format / pure text files in folders / JSON stream dump /
whatever in the first pass, while incrementally building a feature
vocabulary (python dict) + 3 python lists matching the coo_matrix
representation and then we use joblib serializer to save it (after
conversion to CSR) and cache it on disk. Using the joblib serializer
should allow fast reloading thanks to the memmaping magic.
I am interested in this procedure. Is there an example in scikits.learn?
Thanks.
Which one? Joblib is documented on
http://packages.python.org/joblib/
in particular, Olivier was mentioning the use of Memory.cache, with
mmap_mode='r' in the Memory object:
http://packages.python.org/joblib/memory.html

If you are tackling of the more general approach he describes, it is not
implemented yet.

G
Mathieu Blondel
2011-05-03 21:07:20 UTC
Permalink
Post by Olivier Grisel
I think this is reasonable in practice: we need a pure python tools
that scans the input data from disk formatted as svmlight / vowpal
wabbit format / pure text files in folders / JSON stream dump /
whatever in the first pass, while incrementally building a feature
vocabulary (python dict) + 3 python lists matching the coo_matrix
representation and then we use joblib serializer to save it (after
conversion to CSR) and cache it on disk. Using the joblib serializer
should allow fast reloading thanks to the memmaping magic.
Giving it a second thought, it seems to me that the above assumes that
the data fits into memory. Moreover, COO to CSR conversion is not a
good idea if your dataset is large. If possible, it would be better to
incrementally build "data", "indices" and "indptr" and write to disk
from time to time. (Even though the matrix shape needs to provided in
advance, in the CSR case, I suspect it's not even used or just used
for consistency checks)

Mathieu
Olivier Grisel
2011-05-03 21:20:13 UTC
Permalink
Post by Mathieu Blondel
Post by Olivier Grisel
I think this is reasonable in practice: we need a pure python tools
that scans the input data from disk formatted as svmlight / vowpal
wabbit format / pure text files in folders / JSON stream dump /
whatever in the first pass, while incrementally building a feature
vocabulary (python dict) + 3 python lists matching the coo_matrix
representation and then we use joblib serializer to save it (after
conversion to CSR) and cache it on disk. Using the joblib serializer
should allow fast reloading thanks to the memmaping magic.
Giving it a second thought, it seems to me that the above assumes that
the data fits into memory. Moreover, COO to CSR conversion is not a
good idea if your dataset is large. If possible, it would be better to
incrementally build "data", "indices" and "indptr" and write to disk
from time to time. (Even though the matrix shape needs to provided in
advance, in the CSR case, I suspect it's not even used or just used
for consistency checks)
Actually, this is what I had in mind.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2011-05-03 09:42:14 UTC
Permalink
Post by Alexandre Passos
Post by Mathieu Blondel
It would be nice if objects could have both a fit and partial_fit
method so I'd rather not pass it to the constructor...
How about a partial_setup() that has to be called prior to
partial_fit()? It could then be responsible for dataset-specific
information (number of classes, for example, maybe number of features
as well).
I'd rather have 'setup', or 'setup_fit'.

One thing that might be useful is to have a look at the API in MDP: they
have support for such operations.

G
Olivier Grisel
2011-05-02 16:54:22 UTC
Permalink
Post by Mathieu Blondel
Post by Lars Buitinck
I'd say pass it to the constructor as a parameter, but my interest is
currently NLP labeling tasks where the set of labels is known
beforehand. I'm not sure if this is a good fit for other problem
settings.
It would be nice if objects could have both a fit and partial_fit
method so I'd rather not pass it to the constructor...
Sure. partial_fit is optional. To be added as a complement only for
models that supports incremental learning. This is the case for
MiniBatchKMeans model in:

https://github.com/scikit-learn/scikit-learn/pull/132

The fit method retains the semantics on "fit till you can reach the
convergence assuming I give you the complete dataset", while the
semantics for "partial_fit" is more: please update your internal state
to take that chunk of data into account while expecting more in the
future.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2011-05-03 06:12:20 UTC
Permalink
Post by Lars Buitinck
Post by Mathieu Blondel
In the multi-class case, there's the problem that y is not guaranteed
to contain all possible labels, so you can't just do np.unique(y) to
retrieve them. So we need to decide a way to pass it to the object
(argument to partial_fit? argument to the constructor?). Passing
data-dependent to the constructor seems wrong to me and making
partial_fit algorithm dependent is not so good either...
I'd say pass it to the constructor as a parameter, but my interest is
currently NLP labeling tasks where the set of labels is known
beforehand. I'm not sure if this is a good fit for other problem
settings.
I'd rather have it passed to partial_fit. One should be able to
instanciate an estimator without knowing anything about the data at hand.
Post by Lars Buitinck
Post by Mathieu Blondel
Another potential candidate method would be fit_dataset(dataset),
where dataset is an object that knows how to act like an iterator but
which has additional methods like reset() and unique_y(). Such a
dataset object could also know how to sample mini blocks of data. That
would be compatible with partial_fit.
I don't like this idea very much. It reminds me of the overengineered
ML libraries that I've fled before coming to scikit-learn.
+1
Post by Lars Buitinck
Post by Mathieu Blondel
For algorithms which have a learning rate (usually written as eta),
the learning rate needs to be decreased after each iteration and so
eta needs to be stored in the object. We could add a reset() method to
enable objects to reset it.
Probably. I'd like to have a few online learner to play around with the
APIs before we freeze it. I'd say that the floor is open to any
suggestion, but the general idea would be to keep the number additional
methods to a minimum.

G
Mathieu Blondel
2011-05-03 09:31:25 UTC
Permalink
On Tue, May 3, 2011 at 3:12 PM, Gael Varoquaux
Post by Gael Varoquaux
Probably. I'd like to have a few online learner to play around with the
APIs before we freeze it. I'd say that the floor is open to any
suggestion, but the general idea would be to keep the number additional
methods to a minimum.
partial_setup could actually be used to reset the initial eta value,
the iteration counter etc... Having a partial_setup method can leave
the door open to pass other kinds of metadata while keeping
partial_fit algorithm-independent.

Another possibility is to take the convention that the only way to
reset the eta value is to create a new object.

As you say, let's play around with the API on concrete examples.

Mathieu
Olivier Grisel
2011-05-03 09:43:00 UTC
Permalink
Post by Mathieu Blondel
On Tue, May 3, 2011 at 3:12 PM, Gael Varoquaux
Post by Gael Varoquaux
Probably. I'd like to have a few online learner to play around with the
APIs before we freeze it. I'd say that the floor is open to any
suggestion, but the general idea would be to keep the number additional
methods to a minimum.
partial_setup could actually be used to reset the initial eta value,
the iteration counter etc... Having a partial_setup method can leave
the door open to pass other kinds of metadata while keeping
partial_fit algorithm-independent.
Another possibility is to take the convention that the only way to
reset the eta value is to create a new object.
As you say, let's play around with the API on concrete examples.
I am ok for partial_setup then but let's call it partial_fit_setup to
make it more explicit.

learning rate init could be data dependent though and hence could be
dealt with by the first call to partial_fit (provided that the
mini-batch size is big enough).

We really need to work on examples :)
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Lars Buitinck
2011-05-04 12:54:19 UTC
Permalink
Post by Lars Buitinck
For anyone who's interested in this: I pushed a first version to my
Github repo. It should be considered experimental.
I've made quite some progress since monday. Although I'm not there
yet, I'd really like for someone to do a quick review of my code at
https://github.com/larsmans/scikit-learn/tree/perceptron (please post
comments on Github rather than mail them).

Not there yet means:
* No partial_fit yet
* More refactoring to be done
* Sparse avg'd perceptron is still the old defaultdict-based code
* No multiclass perceptron (just 1-vs-all)
* Learning rate doesn't decrease over time

As for the refactoring part: I'm getting ever closer to believing
mblondel was right when he said all three perceptron classes might
better be merged into one with extra parameters. But I'd like to
postpone this until I get the sparse class right with scipy and/or
Cython.
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
Mathieu Blondel
2011-05-06 07:27:12 UTC
Permalink
Post by Lars Buitinck
As for the refactoring part: I'm getting ever closer to believing
mblondel was right when he said all three perceptron classes might
better be merged into one with extra parameters. But I'd like to
postpone this until I get the sparse class right with scipy and/or
Cython.
For those interested, the relevant discussion is here:

https://github.com/larsmans/scikit-learn/commit/d390f1df4d438b290d77fab6bf2d49d371318bcc#comments

It would be really nice if you could factor some sparse Cython
utilities which could be included in the same fashion as a C header
file. I'm thinking of:

sparse_dense_dot(X, row_id, w) # dot(X[row_id], w)
sparse_dense_add(X, row_id, w, scale) # w += X[row_id] * scale

The SGD implementation in scikit-learn uses the computational trick
described in the Pegasos paper, namely storing sparse vectors' scale
and norm separately. I wonder if there's a good way to factor the code
and still use this trick.

Mathieu
Olivier Grisel
2011-05-06 09:17:59 UTC
Permalink
Post by Mathieu Blondel
Post by Lars Buitinck
As for the refactoring part: I'm getting ever closer to believing
mblondel was right when he said all three perceptron classes might
better be merged into one with extra parameters. But I'd like to
postpone this until I get the sparse class right with scipy and/or
Cython.
https://github.com/larsmans/scikit-learn/commit/d390f1df4d438b290d77fab6bf2d49d371318bcc#comments
It would be really nice if you could factor some sparse Cython
utilities which could be included in the same fashion as a C header
sparse_dense_dot(X, row_id, w) # dot(X[row_id], w)
sparse_dense_add(X, row_id, w, scale) # w += X[row_id] * scale
Indeed that would be complementary for the refactoring of the
normalizers / cosine similarity we talked about earlier on the power
iteration clustering pull request.
Post by Mathieu Blondel
The SGD implementation in scikit-learn uses the computational trick
described in the Pegasos paper, namely storing sparse vectors' scale
and norm separately. I wonder if there's a good way to factor the code
and still use this trick.
Worth inverstigating but when factorizing code we should take care not
to make the existing less readable.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Peter Prettenhofer
2011-05-06 09:26:13 UTC
Permalink
Post by Olivier Grisel
Post by Mathieu Blondel
[..]
It would be really nice if you could factor some sparse Cython
utilities which could be included in the same fashion as a C header
sparse_dense_dot(X, row_id, w) # dot(X[row_id], w)
sparse_dense_add(X, row_id, w, scale) # w += X[row_id] * scale
Indeed that would be complementary for the refactoring of the
normalizers / cosine similarity we talked about earlier on the power
iteration clustering pull request.
Indeed - that would be nice, however, cython is a bit balky when it
comes to imports from cython extension modules in a different package.
I tried this for SGD in order to share code between the sparse and
dense implementations. I ended up placing both cython extension
modules in the same package (linear_model/sgd_fast.pyx and
linear_model/sgd_fast_sparse.pyx instead of
linear_model/sparse/sgd_fast.pyx).

best,
Peter
--
Peter Prettenhofer
Mathieu Blondel
2011-05-06 17:20:58 UTC
Permalink
On Fri, May 6, 2011 at 6:26 PM, Peter Prettenhofer
Post by Peter Prettenhofer
Indeed - that would be nice, however, cython is a bit balky when it
comes to imports from cython extension modules in a different package.
I tried this for SGD in order to share code between the sparse and
dense implementations. I ended up placing both cython extension
modules in the same package (linear_model/sgd_fast.pyx and
linear_model/sgd_fast_sparse.pyx instead of
linear_model/sparse/sgd_fast.pyx).
Would be nice if a Cython expert could comment on how to do this. If
it proves to be difficult in Cython, we can always implement it in C
and and import the corresponding C header file in the Cython program.

Mathieu
Olivier Grisel
2011-05-06 17:51:42 UTC
Permalink
Post by Mathieu Blondel
On Fri, May 6, 2011 at 6:26 PM, Peter Prettenhofer
Post by Peter Prettenhofer
Indeed - that would be nice, however, cython is a bit balky when it
comes to imports from cython extension modules in a different package.
I tried this for SGD in order to share code between the sparse and
dense implementations. I ended up placing both cython extension
modules in the same package (linear_model/sgd_fast.pyx and
linear_model/sgd_fast_sparse.pyx instead of
linear_model/sparse/sgd_fast.pyx).
Would be nice if a Cython expert could comment on how to do this. If
it proves to be difficult in Cython, we can always implement it in C
and and import the corresponding C header file in the Cython program.
I think nobody knows how to do this here. Maybe someone could ask on
the cython mailing list.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Continue reading on narkive:
Loading...