Discussion:
Renaming the feature "extraction" module
(too old to reply)
Gilles Louppe
2011-08-09 07:52:35 UTC
Permalink
Dear all,

In pull request #299, I made the following comment. Olivier suggested
to re-send it to the mailing list.

"Just a quick comment on the terminology. According to Guyon [1],
"feature extraction" actually covers both "feature selection" and
"feature construction", and in that sense "feature extraction" may be
both supervised and unsupervised. Within scikit-learn, the
feature_extraction module seems to cover feature construction only
though. Hence the following suggestion: shouldn't we take the
opportunity to rename this module? What do you think?

[1] Isabelle Guyon, "Feature Extraction, Foundations and
Applications", http://clopinet.com/fextract-book/."

Gilles
Olivier Grisel
2011-08-09 08:21:26 UTC
Permalink
Post by Gilles Louppe
Dear all,
In pull request #299, I made the following comment. Olivier suggested
to re-send it to the mailing list.
"Just a quick comment on the terminology. According to Guyon [1],
"feature extraction" actually covers both "feature selection" and
"feature construction", and in that sense "feature extraction" may be
both supervised and unsupervised. Within scikit-learn, the
feature_extraction module seems to cover feature construction only
though. Hence the following suggestion: shouldn't we take the
opportunity to rename this module? What do you think?
[1] Isabelle Guyon, "Feature Extraction, Foundations and
Applications", http://clopinet.com/fextract-book/."
I am ok for renaming the feature_extraction package to
feature_construction. The Wikipedia article is mixing both
construction, selection and dimensionality reduction under the
"feature extraction" umbrella:

http://en.wikipedia.org/wiki/Feature_extraction
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Mathieu Blondel
2011-08-09 09:27:55 UTC
Permalink
Post by Olivier Grisel
I am ok for renaming the feature_extraction package to
feature_construction. The Wikipedia article is mixing both
construction, selection and dimensionality reduction under the
I'm rather -0 on this one: I'm not really opposed to it but everyone I
know uses "feature extraction" to refer to the process of converting
raw objects to a vector representation so I don't really feel the urge
to do it either.

Mathieu
Lars Buitinck
2011-08-09 09:30:37 UTC
Permalink
Post by Mathieu Blondel
Post by Olivier Grisel
I am ok for renaming the feature_extraction package to
feature_construction. The Wikipedia article is mixing both
construction, selection and dimensionality reduction under the
I'm rather -0 on this one: I'm not really opposed to it but everyone I
know uses "feature extraction" to refer to the process of converting
raw objects to a vector representation so I don't really feel the urge
to do it either.
+1
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
Vincent Michel
2011-08-09 09:42:50 UTC
Permalink
Post by Lars Buitinck
Post by Mathieu Blondel
Post by Olivier Grisel
I am ok for renaming the feature_extraction package to
feature_construction. The Wikipedia article is mixing both
construction, selection and dimensionality reduction under the
I'm rather -0 on this one: I'm not really opposed to it but everyone I
know uses "feature extraction" to refer to the process of converting
raw objects to a vector representation so I don't really feel the urge
to do it either.
+1
+1 to Mathieu's comment too.
Post by Lars Buitinck
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
------------------------------------------------------------------------------
uberSVN's rich system and user administration capabilities and model
configuration take the hassle out of deploying and managing Subversion and
the tools developers use with it. Learn more about uberSVN and get a free
download at: http://p.sf.net/sfu/wandisco-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Gael Varoquaux
2011-08-10 11:03:51 UTC
Permalink
Post by Vincent Michel
Post by Mathieu Blondel
I'm rather -0 on this one: I'm not really opposed to it but everyone I
know uses "feature extraction" to refer to the process of converting
raw objects to a vector representation so I don't really feel the urge
to do it either.
+1
+1 to Mathieu's comment too.
I am not terribly happy with the name 'feature_extraction' (its very
long), however, I don't find that 'feature_construction' is more
descriptive or shorter. I actually prefer 'extraction'.

Gael
Alexandre Gramfort
2011-08-10 13:15:19 UTC
Permalink
@Gael

do you suggest:

s/feature_selection/selection
s/feature_extraction/extraction

?

Alex

On Wed, Aug 10, 2011 at 7:03 AM, Gael Varoquaux
Post by Gael Varoquaux
Post by Vincent Michel
Post by Mathieu Blondel
I'm rather -0 on this one: I'm not really opposed to it but everyone I
know uses "feature extraction" to refer to the process of converting
raw objects to a vector representation so I don't really feel the urge
to do it either.
+1
+1 to Mathieu's comment too.
I am not terribly happy with the name 'feature_extraction' (its very
long), however, I don't find that 'feature_construction' is more
descriptive or shorter. I actually prefer 'extraction'.
Gael
------------------------------------------------------------------------------
uberSVN's rich system and user administration capabilities and model
configuration take the hassle out of deploying and managing Subversion and
the tools developers use with it. Learn more about uberSVN and get a free
download at:  http://p.sf.net/sfu/wandisco-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Gael Varoquaux
2011-08-10 13:45:33 UTC
Permalink
Post by Alexandre Gramfort
s/feature_selection/selection
s/feature_extraction/extraction
To be consistent with 'cluster', we would need 'extract', but that seems
a bit non explicit too me. Same thing for your suggestions, but I can be
convinced otherwise.

Actually, I guess my favorite choice would be 'feature_extract', and
'feature_select', but I wonder if there is an actual gain in changing
this, or if I am just nitpicking.

Opinions?

G
Alexandre Gramfort
2011-08-10 13:51:12 UTC
Permalink
+0.5 for nitpicking :)

Alex

On Wed, Aug 10, 2011 at 9:45 AM, Gael Varoquaux
Post by Gael Varoquaux
Post by Alexandre Gramfort
s/feature_selection/selection
s/feature_extraction/extraction
To be consistent with 'cluster', we would need 'extract', but that seems
a bit non explicit too me. Same thing for your suggestions, but I can be
convinced otherwise.
Actually, I guess my favorite choice would be 'feature_extract', and
'feature_select', but I wonder if there is an actual gain in changing
this, or if I am just nitpicking.
Opinions?
G
------------------------------------------------------------------------------
uberSVN's rich system and user administration capabilities and model
configuration take the hassle out of deploying and managing Subversion and
the tools developers use with it. Learn more about uberSVN and get a free
download at:  http://p.sf.net/sfu/wandisco-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Olivier Grisel
2011-08-11 09:36:21 UTC
Permalink
Post by Gael Varoquaux
Post by Alexandre Gramfort
s/feature_selection/selection
s/feature_extraction/extraction
To be consistent with 'cluster', we would need 'extract', but that seems
a bit non explicit too me. Same thing for your suggestions, but I can be
convinced otherwise.
Actually, I guess my favorite choice would be 'feature_extract', and
'feature_select', but I wonder if there is an actual gain in changing
this, or if I am just nitpicking.
Opinions?
Why not feature/extraction and feature/selection (using a common place
holder feature package without any code in it)?
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2011-08-11 09:53:00 UTC
Permalink
Post by Olivier Grisel
Why not feature/extraction and feature/selection (using a common place
holder feature package without any code in it)?
I actually would have like to be able to tell people that anything that
they needed to do their work could be found at most one level deeper than
the base scikit import. It's already not working :).

On the one hand, I think that you are right that the fact that we have two
sub-packages starting with 'feature' is a code smell. On the other hand,
I think that they do something fairly different, and I personally would
be surprised to find them under the same namespace.

G
Olivier Grisel
2011-08-11 10:02:37 UTC
Permalink
Post by Gael Varoquaux
Post by Olivier Grisel
Why not feature/extraction and feature/selection (using a common place
holder feature package without any code in it)?
I actually would have like to be able to tell people that anything that
they needed to do their work could be found at most one level deeper than
the base scikit import. It's already not working :).
On the one hand, I think that you are right that the fact that we have two
sub-packages starting with 'feature' is a code smell. On the other hand,
I think that they do something fairly different, and I personally would
be surprised to find them under the same namespace.
I agree. But usually when you do your own feature extraction it's
often useful to combine it with a feature selection transformer to get
rid off the pure noise features as implemented with the feature
selection transformers. So it's not completely unreasonable to nest
them into a common toplevel package.

At least this is the case for the text features, maybe less so for
image features.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Lars Buitinck
2011-08-11 09:43:40 UTC
Permalink
Post by Olivier Grisel
Why not feature/extraction and feature/selection (using a common place
holder feature package without any code in it)?
Because "flat is better than nested". I don't like very deeply nested
directory hierarchies, they make it harder to find the files I want to
hack on and require boilerplate __init__.py files that may have to be
maintained to export the right stuff.
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
Olivier Grisel
2011-08-11 10:38:27 UTC
Permalink
Post by Lars Buitinck
Post by Olivier Grisel
Why not feature/extraction and feature/selection (using a common place
holder feature package without any code in it)?
Because "flat is better than nested". I don't like very deeply nested
directory hierarchies, they make it harder to find the files I want to
hack on and require boilerplate __init__.py files that may have to be
maintained to export the right stuff.
I agree too. Maybe we can just keep the current names then :)

I am also +0 to shorten them using their verb form as suggest by Gael.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Mathieu Blondel
2011-08-11 10:46:28 UTC
Permalink
On Thu, Aug 11, 2011 at 7:38 PM, Olivier Grisel
Post by Olivier Grisel
I am also +0 to shorten them using their verb form as suggest by Gael.
I'm not for using a verb. "cluster" can be seen as a noun and and all
other modules are nouns.

Mathieu

Continue reading on narkive:
Loading...