Discussion:
CIFAR-10 and test images
(too old to reply)
Vlad Niculae
2011-07-10 19:43:34 UTC
Permalink
Hello

There are two things that I would like to touch in the dictionary
learning/image stuff. First would be to make sure and prove with
examples that the stuff works on colour images, and the second is
image classification.

For the first, I'd like to have a colour version of lena (sp.lena() is
grayscale), and it would be nice to have the other standard test
images (barbara, boat) easily available for users.
BTW, I have the impression that PIL has these somewhere, so using it
could be an option (but it would make some examples PIL-dependent only
for this).

For the second, the CIFAR-10 dataset should be fun, and it could be
treated in the same vein as the LFW dataset.

What are your thoughts on this?

Best,
Vlad
Alexandre Gramfort
2011-07-10 19:59:29 UTC
Permalink
Hi Vlad,

What is the size of the datasets?

Examples with large datasets and long computation time
should be put in examples/applications

Regarding the dependency on PIL I am not a huge fan. I think
we can find a better way.

Alex
Post by Vlad Niculae
There are two things that I would like to touch in the dictionary
learning/image stuff. First would be to make sure and prove with
examples that the stuff works on colour images, and the second is
image classification.
For the first, I'd like to have a colour version of lena (sp.lena() is
grayscale), and it would be nice to have the other standard test
images (barbara, boat) easily available for users.
BTW, I have the impression that PIL has these somewhere, so using it
could be an option (but it would make some examples PIL-dependent only
for this).
For the second, the CIFAR-10 dataset should be fun, and it could be
treated in the same vein as the LFW dataset.
What are your thoughts on this?
Best,
Vlad
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Vlad Niculae
2011-07-10 20:12:45 UTC
Permalink
Hi Alex

CIFAR-10 has a Pickled version weighing in at 163 MB

Some test bmps:
lena: 512x512 24bpp: 768kb
peppers: 512x512 24bpp: 768kb
barbara: 720x576 24bpp: 1.18MB
boats: 720x576 8bpp (grayscale): 406kb

There are a couple of other nice test images at [1]. We could include
none, one, all, or download on demand.

Personally I'd like Barbara for denoising [2]

[1] http://www.hlevkin.com/TestImages/classic.htm
[2] http://www.hlevkin.com/TestImages/barbara.bmp
On Sun, Jul 10, 2011 at 10:59 PM, Alexandre Gramfort
Post by Alexandre Gramfort
Hi Vlad,
What is the size of the datasets?
Examples with large datasets and long computation time
should be put in examples/applications
Regarding the dependency on PIL I am not a huge fan. I think
we can find a better way.
Alex
Post by Vlad Niculae
There are two things that I would like to touch in the dictionary
learning/image stuff. First would be to make sure and prove with
examples that the stuff works on colour images, and the second is
image classification.
For the first, I'd like to have a colour version of lena (sp.lena() is
grayscale), and it would be nice to have the other standard test
images (barbara, boat) easily available for users.
BTW, I have the impression that PIL has these somewhere, so using it
could be an option (but it would make some examples PIL-dependent only
for this).
For the second, the CIFAR-10 dataset should be fun, and it could be
treated in the same vein as the LFW dataset.
What are your thoughts on this?
Best,
Vlad
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Alexandre Gramfort
2011-07-10 20:20:18 UTC
Permalink
Hi Vlad,
Post by Vlad Niculae
CIFAR-10 has a Pickled version weighing in at 163 MB
+1 for putting it in applications.
Post by Vlad Niculae
lena:  512x512 24bpp: 768kb
peppers: 512x512 24bpp: 768kb
barbara: 720x576 24bpp: 1.18MB
boats: 720x576 8bpp (grayscale): 406kb
There are a couple of other nice test images at [1]. We could include
none, one, all, or download on demand.
Personally I'd like Barbara for denoising [2]
[1] http://www.hlevkin.com/TestImages/classic.htm
[2] http://www.hlevkin.com/TestImages/barbara.bmp
would it be possible to use the work with mldata download to do this?
how big is a compressed version stored with numpy?
other datasets in the scikits are less than 80KB.

Alex
Vlad Niculae
2011-07-10 20:28:52 UTC
Permalink
On Sun, Jul 10, 2011 at 11:20 PM, Alexandre Gramfort
Post by Alexandre Gramfort
Hi Vlad,
Post by Vlad Niculae
CIFAR-10 has a Pickled version weighing in at 163 MB
+1 for putting it in applications.
Post by Vlad Niculae
lena:  512x512 24bpp: 768kb
peppers: 512x512 24bpp: 768kb
barbara: 720x576 24bpp: 1.18MB
boats: 720x576 8bpp (grayscale): 406kb
There are a couple of other nice test images at [1]. We could include
none, one, all, or download on demand.
Personally I'd like Barbara for denoising [2]
[1] http://www.hlevkin.com/TestImages/classic.htm
[2] http://www.hlevkin.com/TestImages/barbara.bmp
would it be possible to use the work with mldata download to do this?
how big is a compressed version stored with numpy?
other datasets in the scikits are less than 80KB.
Scipy contains a file lena.dat that takes 527.628 kb [1]
I'm +1 for downloading test images, but I don't know if they're on
mldata. I will check.

[1] https://github.com/scipy/scipy/blob/master/scipy/misc/lena.dat
Post by Alexandre Gramfort
Alex
------------------------------------------------------------------------------
All of the data generated in your IT infrastructure is seriously valuable.
Why? It contains a definitive record of application performance, security
threats, fraudulent activity, and more. Splunk takes this data and makes
sense of it. IT sense. And common sense.
http://p.sf.net/sfu/splunk-d2d-c2
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Alexandre Gramfort
2011-07-12 12:44:37 UTC
Permalink
Post by Vlad Niculae
Post by Alexandre Gramfort
would it be possible to use the work with mldata download to do this?
how big is a compressed version stored with numpy?
other datasets in the scikits are less than 80KB.
Scipy contains a file lena.dat that takes 527.628 kb [1]
I'm +1 for downloading test images, but I don't know if they're on
mldata. I will check.
I feel it's too big and that files should be downloaded.
Maybe you can create an entry on mldata with the images?
Or use a download link from github.

Alex
Gael Varoquaux
2011-07-15 00:00:23 UTC
Permalink
-1 on a PIL dependency (PIL is loosing momentum in some circles). +1 on
some downloading.

Just to check: mldata requires a login/passwd, doesn't it? If so, I am -1
on mldata.

G
Olivier Grisel
2011-07-15 07:45:58 UTC
Permalink
Post by Gael Varoquaux
-1 on a PIL dependency (PIL is loosing momentum in some circles). +1 on
some downloading.
Just to check: mldata requires a login/passwd, doesn't it? If so, I am -1
on mldata.
No it does not.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Olivier Grisel
2011-07-15 09:51:13 UTC
Permalink
Post by Gael Varoquaux
-1 on a PIL dependency (PIL is loosing momentum in some circles). +1 on
some downloading.
About the PIL issue, we can use the same approach as in the LFW loader
that uses PIL indirectly trough the scipy public API: that way we only
have an optional, transitive dependency on PIL that can change the day
the scipy maintainer want to use something else that is not PIL but
provides efficient JPEG decoding.

https://github.com/scikit-learn/scikit-learn/blob/master/scikits/learn/datasets/lfw.py#L32

Also we might want to move the backward compat code for scipy.misc /
scipy.misc.pilutil optional import in the `utils` of scikit-learn
rather than duplicate it.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Vlad Niculae
2011-07-15 18:57:22 UTC
Permalink
On Fri, Jul 15, 2011 at 12:51 PM, Olivier Grisel
Post by Olivier Grisel
Post by Gael Varoquaux
-1 on a PIL dependency (PIL is loosing momentum in some circles). +1 on
some downloading.
About the PIL issue, we can use the same approach as in the LFW loader
that uses PIL indirectly trough the scipy public API: that way we only
have an optional, transitive dependency on PIL that can change the day
the scipy maintainer want to use something else that is not PIL but
provides efficient JPEG decoding.
https://github.com/scikit-learn/scikit-learn/blob/master/scikits/learn/datasets/lfw.py#L32
CIFAR-10 is already in python pickle format. The PIL issue was raised
because I think that PIL embeds some cooler test images that we could
use (colour Lena for example).

I think we all decided that we should put in code to download a set of
test images such as colour Lena and Barbara. What I'm not sure is
whether mldata is an appropriate place to upload them.
Post by Olivier Grisel
Also we might want to move the backward compat code for scipy.misc /
scipy.misc.pilutil optional import in the `utils` of scikit-learn
rather than duplicate it.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
AppSumo Presents a FREE Video for the SourceForge Community by Eric
Ries, the creator of the Lean Startup Methodology on "Lean Startup
Secrets Revealed." This video shows you how to validate your ideas,
optimize your ideas and identify your business strategy.
http://p.sf.net/sfu/appsumosfdev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Alexandre Gramfort
2011-07-19 10:26:00 UTC
Permalink
Post by Vlad Niculae
CIFAR-10 is already in python pickle format. The PIL issue was raised
because I think that PIL embeds some cooler test images that we could
use (colour Lena for example).
I think we all decided that we should put in code to download a set of
test images such as colour Lena and Barbara. What I'm not sure is
whether mldata is an appropriate place to upload them.
I see 2 options:

option 1:
- write a fetch function that reads the bmp files from
http://www.hlevkin.com/TestImages/
and write a few lines of numpy dtype kung fu to read the bmp files as
numpy arrays (no PIL).
- convert the bmp files to *.npy or pickle formats and put them in a place
where we can download them (mldata or any other place).

Alex
Olivier Grisel
2011-07-19 10:31:30 UTC
Permalink
Post by Alexandre Gramfort
Post by Vlad Niculae
CIFAR-10 is already in python pickle format. The PIL issue was raised
because I think that PIL embeds some cooler test images that we could
use (colour Lena for example).
I think we all decided that we should put in code to download a set of
test images such as colour Lena and Barbara. What I'm not sure is
whether mldata is an appropriate place to upload them.
- write a fetch function that reads the bmp files from
http://www.hlevkin.com/TestImages/
and write a few lines of numpy dtype kung fu to read the bmp files as
numpy arrays (no PIL).
- convert the bmp files to *.npy or pickle formats and put them in a place
where we can download them (mldata or any other place).
That involves distribution / licensing issue.

I am +1 for building a *.npy archive of cool test pictures that have
licenses that explicitly allow redistribution of derivative works (e.g.
CC By 3.0) + a README.md file with the URL of the source images and
the author and license info.

Potential sources:

http://commons.wikimedia.org/wiki/Main_Page

Or flickr / picasa queries with explicit licensing criteria.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Alexandre Gramfort
2011-07-19 11:12:53 UTC
Permalink
Post by Olivier Grisel
I am +1 for building a *.npy archive of cool test pictures
where would you host them?

Alex
Olivier Grisel
2011-07-19 12:54:29 UTC
Permalink
Post by Alexandre Gramfort
Post by Olivier Grisel
I am +1 for building a *.npy archive of cool test pictures
where would you host them?
mldata.org sounds appropriate if there is not licensing distribution issues.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Mathieu Blondel
2011-08-12 11:34:38 UTC
Permalink
In the end, what did you decide / do? A color image of Lena will be
useful for making a color quantization example, to illustrate the new
vector quantization API in Kmeans.

Mathieu
Vlad Niculae
2011-08-12 11:40:10 UTC
Permalink
Well I think that a loader for CIFAR-10 should be included in
applications the moment when dictionary learning will be usable for
supervised tasks.

Regarding test images, I'm not sure whether Lena, Barbara et al. are
in the public domain, and also I'm not sure how mldata works: do you
need an account for downloading? Do we have a precedent of an example
that uses data from mldata?
Post by Mathieu Blondel
In the end, what did you decide / do? A color image of Lena will be
useful for making a color quantization example, to illustrate the new
vector quantization API in Kmeans.
Mathieu
------------------------------------------------------------------------------
Get a FREE DOWNLOAD! and learn more about uberSVN rich system,
user administration capabilities and model configuration. Take
the hassle out of deploying and managing Subversion and the
tools developers use with it.
http://p.sf.net/sfu/wandisco-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Mathieu Blondel
2011-08-12 12:09:08 UTC
Permalink
Post by Vlad Niculae
Well I think that a loader for CIFAR-10 should be included in
applications the moment when dictionary learning will be usable for
supervised tasks.
Sounds like a plan.
Post by Vlad Niculae
Regarding test images, I'm not sure whether Lena, Barbara et al. are
in the public domain, and also I'm not sure how mldata works: do you
I would guess so, as it is included in scipy. It would be nice to have
a color picture of Lena soon as I would like to add a quick color
quantization example.
Post by Vlad Niculae
need an account for downloading? Do we have a precedent of an example
that uses data from mldata?
Last time I tried, I didn't need an account.

Mathieu
Gael Varoquaux
2011-08-12 12:12:52 UTC
Permalink
Post by Mathieu Blondel
I would guess so, as it is included in scipy. It would be nice to have
a color picture of Lena soon as I would like to add a quick color
quantization example.
Post by Vlad Niculae
need an account for downloading? Do we have a precedent of an example
that uses data from mldata?
Last time I tried, I didn't need an account.
I should mention that to use datasets in examples ran by the docs, they
should be fairly light on the disk (and we should avoid having too many).
In this spirit, one color image of Lena is an option to be used in the
docs, but that whole CIFAR-10 dataset is probably too big.

G
Mathieu Blondel
2011-08-12 12:20:38 UTC
Permalink
On Fri, Aug 12, 2011 at 9:12 PM, Gael Varoquaux
Post by Gael Varoquaux
I should mention that to use datasets in examples ran by the docs, they
should be fairly light on the disk (and we should avoid having too many).
In this spirit, one color image of Lena is an option to be used in the
docs, but that whole CIFAR-10 dataset is probably too big.
Why did we need two face recognition datasets by the way? (No offense,
just asking out of curiosity)

Mathieu
Gael Varoquaux
2011-08-12 12:22:05 UTC
Permalink
Post by Mathieu Blondel
Why did we need two face recognition datasets by the way? (No offense,
just asking out of curiosity)
Labeled Faces in the Wild is just too big for demo purposes, so we needed
another one (now we can use faces in demos!). However, Olivetty is really
a toy problem.

Gael
Mathieu Blondel
2011-08-12 12:26:25 UTC
Permalink
On Fri, Aug 12, 2011 at 9:22 PM, Gael Varoquaux
Post by Gael Varoquaux
Labeled Faces in the Wild is just too big for demo purposes, so we needed
another one (now we can use faces in demos!). However, Olivetty is really
a toy problem.
Thanks, that sounds like a valid reason.

Mathieu
Vlad Niculae
2011-08-12 12:17:39 UTC
Permalink
Post by Mathieu Blondel
Post by Vlad Niculae
Regarding test images, I'm not sure whether Lena, Barbara et al. are
in the public domain, and also I'm not sure how mldata works: do you
I would guess so, as it is included in scipy. It would be nice to have
a color picture of Lena soon as I would like to add a quick color
quantization example.
Ideally what I'd like is a .npy file with all the color images from
this page http://www.hlevkin.com/TestImages/classic.htm, or maybe we
could upload Lena separately.

They are all common images that I've seen many times in papers and
online, and this is good for qualitatively comparing results, rather
than using some random public domain images from flickr.

Vlad
Gael Varoquaux
2011-08-12 12:25:23 UTC
Permalink
Post by Vlad Niculae
Ideally what I'd like is a .npy file with all the color images from
this page http://www.hlevkin.com/TestImages/classic.htm, or maybe we
could upload Lena separately.
These look good. Maybe we can write a downloader that download and saves
them as an npz. No need to download color and black and white versions,
though.

People should express their opinion here: we can have only a finite small
number of downloaders, so it is important that we reach a concensus.

G
Mathieu Blondel
2011-08-12 13:01:47 UTC
Permalink
On Fri, Aug 12, 2011 at 9:25 PM, Gael Varoquaux
Post by Gael Varoquaux
These look good. Maybe we can write a downloader that download and saves
them as an npz. No need to download color and black and white versions,
though.
I wanted to check the size of the .npz files because I suspect that
they would be small enough to ship with the scikit but I've just
realized that, quite surprisingly, np.savez does *not* compress files.
I wonder if joblib has compression support?
Post by Gael Varoquaux
People should express their opinion here: we can have only a finite small
number of downloaders, so it is important that we reach a concensus.
The problem with downloaders is that examples can break if the remote
server is unavailable.

+1 for storing only the color versions.

Mathieu
Gael Varoquaux
2011-08-12 13:10:17 UTC
Permalink
Post by Mathieu Blondel
I wanted to check the size of the .npz files because I suspect that
they would be small enough to ship with the scikit
If we are going to ship with the scikit, we should choose a very small
number of images, maybe three. That should however be enough to explore
different situations.
Post by Mathieu Blondel
I wonder if joblib has compression support?
No. It has been focused on speed of loading so far, although if you have
seekable compression and a well-written code, seed of loading can be
improved with compression. I would love that feature to go in, but I
suspect that to get it right requires a bit of work.

That said, the simple option that I used in the 20newsgroup works really
well for our purposes:

file('foo.pkz', 'wb').write(pickle.dumps(obj).encode('zip'))

obj = pickle.loads(file('foo.pkz', 'rb').read().decode('zip'))
Post by Mathieu Blondel
Post by Gael Varoquaux
People should express their opinion here: we can have only a finite small
number of downloaders, so it is important that we reach a concensus.
The problem with downloaders is that examples can break if the remote
server is unavailable.
Yes, but if we want the scikit to be a libary used in production we
shouldn't make it heavier because of examples.

G
Vlad Niculae
2011-08-12 13:16:08 UTC
Permalink
On Fri, Aug 12, 2011 at 4:10 PM, Gael Varoquaux
Post by Gael Varoquaux
Post by Mathieu Blondel
I wanted to check the size of the .npz files because I suspect that
they would be small enough to ship with the scikit
If we are going to ship with the scikit, we should choose a very small
number of images, maybe three. That should however be enough to explore
different situations.
+1. Lena should be in, I vote for Barbara for denoising.
Post by Gael Varoquaux
Post by Mathieu Blondel
The problem with downloaders is that examples can break if the remote
server is unavailable.
Yes, but if we want the scikit to be a libary used in production we
shouldn't make it heavier because of examples.
+1, it's not that big deal that some datasets need an internet
connection. Datasets are for research, trying out algorithms and new
ideas. If you're going to want to work with one, you can download it
while online, and keep it in scikits-learn-data.

And examples should fail gracefully when the data can't be obtained.
Gilles Louppe
2011-08-12 13:18:58 UTC
Permalink
Post by Gael Varoquaux
Yes, but if we want the scikit to be a libary used in production we
shouldn't make it heavier because of examples.
I am still new in the scikit-learn community, and maybe this was
already discussed before, but why not considering the following idea.
Namely, split scikit-learn in two packages: a regular "scikit-learn"
package with the library itself, and a second "scikit-learn-doc" (or
whatever the name) with the documentation, the examples and all the
required resources files? This is a common practice in many many other
software packages.

Just my 2 cents :)

Gilles
Gael Varoquaux
2011-08-12 13:26:04 UTC
Permalink
Post by Gilles Louppe
I am still new in the scikit-learn community, and maybe this was
already discussed before, but why not considering the following idea.
Namely, split scikit-learn in two packages: a regular "scikit-learn"
package with the library itself, and a second "scikit-learn-doc" (or
whatever the name) with the documentation, the examples and all the
required resources files? This is a common practice in many many other
software packages.
This is a distribution problem. Debian already does that. Pypi, which is
the most common distribution mechanism used in Python by a score of
people (in particular the web developers, which we definitely are
targetting) does not offer an easy way to do this.

That said, you are right that we should make it is easy for distributions
to include the datasets in a separate package (overiding paths, for
instance), and alway fail gracefully when the datasets are not present.
This is an area where the scikit could be improved.

Finally, I must say that on a personal basis, I hate to download a
package, and find that I must download extra stuff to run the most of the
examples. I have some heavy-weight package in mind, when I mention this
(VTK, for instance). This is why it is still good to have a policy of
focused light datasets for most examples, and downloaders for a few
'exciting' examples.

Gael
Mathieu Blondel
2011-08-12 14:26:44 UTC
Permalink
Post by Gilles Louppe
I am still new in the scikit-learn community, and maybe this was
already discussed before, but why not considering the following idea.
Namely, split scikit-learn in two packages: a regular "scikit-learn"
package with the library itself, and a second "scikit-learn-doc" (or
whatever the name) with the documentation, the examples and all the
required resources files? This is a common practice in many many other
software packages.
Actually most of the documentation is generated on the fly (the
"auto-examples" in sphinx) and the scikit-learn tarball doesn't ship
the generated html.

Mathieu
Vlad Niculae
2011-08-17 15:16:31 UTC
Permalink
Regarding licences:
Like Mathieu said, it seems that Playboy is willing to overlook this
kind of usage of Lena
Barbara is a bit hard to track, it seems she is from here: [1]
Boat, Pepper, and most others, according to [2]: Sources of these are
unknown and copyright status is also unknown.

[1] http://www.stat.ucl.ac.be/ISdidactique/Rhelp/library/waveslim/html/barbara.html
[2] http://sipi.usc.edu/database/copyright.php

With greetings from the seaside,
Vlad
Olivier Grisel
2011-08-17 15:33:14 UTC
Permalink
Post by Vlad Niculae
Like Mathieu said, it seems that Playboy is willing to overlook this
kind of usage of Lena
You cannot know what the users of scikit-learn will want to do out of
the source code and included data. We should be consistent with the
BSD license which is very liberal.
Post by Vlad Niculae
Barbara is a bit hard to track, it seems she is from here: [1]
Boat, Pepper, and most others, according to [2]: Sources of these are
unknown and copyright status is also unknown.
So just don't use them. There are plenty of picture on flickr or
commons.wikimedia.org with explicit license info that allow
redistribution:

http://www.flickr.com/search/?l=commderiv&ss=0&ct=0&mt=all&adv=1&s=int

The research community really needs to educate it-self and use
explicitly Open Data as raw material to develop an Open Science.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Vlad Niculae
2011-08-17 19:17:21 UTC
Permalink
Post by Olivier Grisel
So just don't use them. There are plenty of picture on flickr or
commons.wikimedia.org with explicit license info that allow
 http://www.flickr.com/search/?l=commderiv&ss=0&ct=0&mt=all&adv=1&s=int
The research community really needs to educate it-self and use
explicitly Open Data as raw material to develop an Open Science.
+1 for educating the scientific community :)
I don't really like the fact that we would miss the easy reproduction
and comparability with published results that we would have by using
the standard images, but I agree it's the right thing to do.
David Warde-Farley
2011-08-17 19:40:11 UTC
Permalink
Post by Vlad Niculae
Like Mathieu said, it seems that Playboy is willing to overlook this
kind of usage of Lena
Pssssst.

In [1]: from scipy.misc import lena

In [2]: lena()
Out[2]:
array([[162, 162, 162, ..., 170, 155, 128],
[162, 162, 162, ..., 170, 155, 128],
[162, 162, 162, ..., 170, 155, 128],
...,
[ 43, 43, 50, ..., 104, 100, 98],
[ 44, 44, 55, ..., 104, 105, 108],
[ 44, 44, 55, ..., 104, 105, 108]])

*whistles*

David

P.S. Forgive me if this has already been mentioned in the thread, but
the great debate about whether to ship Lena with scikits-learn may be a moot
point if scipy is a dependency. :)
Vlad Niculae
2011-08-18 05:42:08 UTC
Permalink
Post by David Warde-Farley
P.S. Forgive me if this has already been mentioned in the thread, but
the great debate about whether to ship Lena with scikits-learn may be a moot
point if scipy is a dependency. :)
That Lena is black and white, we already use it for the image
denoising example, but I would have liked color examples, especially
for k-means color quantization like Mathieu said.

And just because Scipy does it, doesn't mean it's right :P

Best,
Vlad
Mathieu Blondel
2011-08-12 14:24:15 UTC
Permalink
On Fri, Aug 12, 2011 at 10:10 PM, Gael Varoquaux
   file('foo.pkz', 'wb').write(pickle.dumps(obj).encode('zip'))
   obj = pickle.loads(file('foo.pkz', 'rb').read().decode('zip'))
Another option is to store the files in jpeg, which I guess imread can
load, instead of npz. That would probably allow to solve even more
space but the compression would be lossy. 3 pictures seems like a fair
number.

Mathieu
Nelle Varoquaux
2011-08-12 13:14:31 UTC
Permalink
Post by Mathieu Blondel
Post by Vlad Niculae
Well I think that a loader for CIFAR-10 should be included in
applications the moment when dictionary learning will be usable for
supervised tasks.
Sounds like a plan.
Post by Vlad Niculae
Regarding test images, I'm not sure whether Lena, Barbara et al. are
in the public domain, and also I'm not sure how mldata works: do you
I would guess so, as it is included in scipy. It would be nice to have
a color picture of Lena soon as I would like to add a quick color
quantization example.
Actually, it isn't, but Playboy "has decided to overlook the widespread
distribution of this particular centerfold."
Post by Mathieu Blondel
Post by Vlad Niculae
need an account for downloading? Do we have a precedent of an example
that uses data from mldata?
Last time I tried, I didn't need an account.
Mathieu
------------------------------------------------------------------------------
Get a FREE DOWNLOAD! and learn more about uberSVN rich system,
user administration capabilities and model configuration. Take
the hassle out of deploying and managing Subversion and the
tools developers use with it.
http://p.sf.net/sfu/wandisco-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Gael Varoquaux
2011-08-12 13:15:21 UTC
Permalink
Actually, it isn't, but Playboy "has decided to overlook the widespread
distribution of this particular centerfold."
Let's ship the full picture then :)

G
Vlad Niculae
2011-08-12 13:17:16 UTC
Permalink
Post by Gael Varoquaux
Let's ship the full picture then :)
scikits.learn: the world's first NSFW machine learning library!
Andreas Müller
2011-08-29 10:45:02 UTC
Permalink
Post by Vlad Niculae
Post by Gael Varoquaux
Let's ship the full picture then :)
scikits.learn: the world's first NSFW machine learning library!
+1

About PIL: My (Ubuntu) version of PIL has no color lena.

also +1 to storing the images in JPEG. I don't think the lossy
compression should be a problem.
Is anyone still working on this?


andy
Vlad Niculae
2011-08-29 10:56:35 UTC
Permalink
Check out the newly merged VQ quantization example using K Means. We
decided on CC By images, to set a good example of using licensed data
in science.
Best,
Vlad

Sent from my iPod
Post by Andreas Müller
Post by Vlad Niculae
Post by Gael Varoquaux
Let's ship the full picture then :)
scikits.learn: the world's first NSFW machine learning library!
+1
About PIL: My (Ubuntu) version of PIL has no color lena.
also +1 to storing the images in JPEG. I don't think the lossy
compression should be a problem.
Is anyone still working on this?
andy
------------------------------------------------------------------------------
EMC VNX: the world's simplest storage, starting under $10K
The only unified storage solution that offers unified management
Up to 160% more powerful than alternatives and 25% more efficient.
Guaranteed. http://p.sf.net/sfu/emc-vnx-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Loading...