Discussion:
[Scikit-learn-general] Using train_test_split with images from my local directory
Abder-Rahman Ali
2016-01-24 13:59:13 UTC
Permalink
Hi,


I have read the images from my local directory as follows:

from PIL import Imageimport os

root = '/Users/xyz/Desktop/data'
for path, subdirs, files in os.walk(root):
for name in files:
img_path = os.path.join(path,name)

I have two subdirectories: category-1 and category-2, each of which
contains image files (.jpg) that belong to each category.

How can I use those images and two categories with the train_test_split()
<http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.train_test_split.html>
function
in Scikit-Learn? In other words, to arrange the training and testing data?

Thanks.
Sebastian Raschka
2016-01-24 18:32:43 UTC
Permalink
Just remembered that I have an example notebook for reading MNIST into NumPy arrays, maybe that helps:

http://nbviewer.jupyter.org/github/rasbt/pattern_classification/blob/master/data_collecting/reading_mnist.ipynb
Post by Abder-Rahman Ali
Hi,
from PIL import Image
import
os
root
= '/Users/xyz/Desktop/data'
img_path
= os.path.join(path,name)
I have two subdirectories: category-1 and category-2, each of which contains image files (.jpg) that belong to each category.
How can I use those images and two categories with the train_test_split() function in Scikit-Learn? In other words, to arrange the training and testing data?
Thanks.
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Stéfan van der Walt
2016-01-25 05:09:41 UTC
Permalink
On Sun, Jan 24, 2016 at 5:59 AM, Abder-Rahman Ali
Post by Abder-Rahman Ali
I have two subdirectories: category-1 and category-2, each of which contains
image files (.jpg) that belong to each category.
How can I use those images and two categories with the train_test_split()
function in Scikit-Learn? In other words, to arrange the training and
testing data?
How about something like:

```
from skimage.io import ImageCollection
from sklearn.cross_validation import train_test_split
import os.path

ic = ImageCollection('/path/to/category-1:/path/to/category-2')
labels = [os.path.dirname(f) for f in ic.files]

train_images, test_images, train_labels, test_labels =
train_test_split(ic, labels)
```

Stéfan
Zeyad Abdelmottaleb
2016-02-10 05:00:22 UTC
Permalink
Stefan,

I’ve tried this method and I’m getting this error while implementing RandomizedPCA;

setting an array element with a sequence.

help?

Regards,
Zeyad
Post by Stéfan van der Walt
On Sun, Jan 24, 2016 at 5:59 AM, Abder-Rahman Ali
Post by Abder-Rahman Ali
I have two subdirectories: category-1 and category-2, each of which contains
image files (.jpg) that belong to each category.
How can I use those images and two categories with the train_test_split()
function in Scikit-Learn? In other words, to arrange the training and
testing data?
```
from skimage.io import ImageCollection
from sklearn.cross_validation import train_test_split
import os.path
ic = ImageCollection('/path/to/category-1:/path/to/category-2')
labels = [os.path.dirname(f) for f in ic.files]
train_images, test_images, train_labels, test_labels =
train_test_split(ic, labels)
```
Stéfan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2016-02-10 16:27:58 UTC
Permalink
Your image have different sizes. For RandomizedPCA to work, they all
need to have the same size.
Post by Zeyad Abdelmottaleb
Stefan,
I’ve tried this method and I’m getting this error while implementing
RandomizedPCA;
setting an array element with a sequence.
help?
Regards,
Zeyad
On Jan 24, 2016, at 11:09 PM, Stéfan van der Walt
On Sun, Jan 24, 2016 at 5:59 AM, Abder-Rahman Ali
Post by Abder-Rahman Ali
I have two subdirectories: category-1 and category-2, each of which contains
image files (.jpg) that belong to each category.
How can I use those images and two categories with the
train_test_split()
function in Scikit-Learn? In other words, to arrange the training and
testing data?
```
from skimage.io <http://skimage.io> import ImageCollection
from sklearn.cross_validation import train_test_split
import os.path
ic = ImageCollection('/path/to/category-1:/path/to/category-2')
labels = [os.path.dirname(f) for f in ic.files]
train_images, test_images, train_labels, test_labels =
train_test_split(ic, labels)
```
Stéfan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Zeyad Abdelmottaleb
2016-02-10 17:52:52 UTC
Permalink
I defined a custom load_func that resize after imread and used it in ImageCollection class, do I need to change to greyscale?
Your image have different sizes. For RandomizedPCA to work, they all need to have the same size.
Post by Zeyad Abdelmottaleb
Stefan,
I’ve tried this method and I’m getting this error while implementing RandomizedPCA;
setting an array element with a sequence.
help?
Regards,
Zeyad
Post by Stéfan van der Walt
On Sun, Jan 24, 2016 at 5:59 AM, Abder-Rahman Ali
Post by Abder-Rahman Ali
I have two subdirectories: category-1 and category-2, each of which contains
image files (.jpg) that belong to each category.
How can I use those images and two categories with the train_test_split()
function in Scikit-Learn? In other words, to arrange the training and
testing data?
```
from skimage.io import ImageCollection
from sklearn.cross_validation import train_test_split
import os.path
ic = ImageCollection('/path/to/category-1:/path/to/category-2')
labels = [os.path.dirname(f) for f in ic.files]
train_images, test_images, train_labels, test_labels =
train_test_split(ic, labels)
```
Stéfan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2016-02-10 18:09:24 UTC
Permalink
Post by Zeyad Abdelmottaleb
I defined a custom load_func that resize after imread and used it in
ImageCollection class, do I need to change to greyscale?
No. You need to provide the traceback and your code (ideally with a way
to reproduce) to get help.
Also, try stackoverflow, you'd might get more answers for this kind of
debugging question.

Loading...