Oh wow, very cool. Thank you very much for the assistance and info Alexander!
From: afabisch [mailto:***@mailhost.informatik.uni-bremen.de]
Sent: Saturday, April 18, 2015 9:15 AM
Subject: Re: [Scikit-learn-general] TSNE Memory Error
memory is a problem in our implementation of MNIST. I sent a detailed list of the required memory to this mailing list some month ago. You can find it here:
The number of features is irrelevant. Only the number of samples is important. You have too many samples because the algorithm requires
O(n^2) space (in your case probably about 30 GB). I would not use the original t-SNE algorithm for this dataset anyway because the complexity is O(n^2) as well, which means that you would have to wait some days or weeks for the result.
There is a new pull request that implements Barnes-Hut t-SNE here:
The advantage of Barnes-Hut t-SNE in comparison to t-SNE is that you would have a complexity of O(n log n). However, at the moment the full distance matrix is still computed so that would not fix your original problem but I think the memory problem should be solved soon.
In your case you could take half of the dataset. The number of features is not critical at all. You can take all 93 features without any dimensionality reduction.
Post by Jason Wolosonovich
My dataset has 93 features and just under 62,000 observations (61,878
to be exact). I'm running out of memory right after the mean sigma
value is computed/displayed. I've tried using dimensionality reduction
via TruncatedSVD with n_components set at different levels (78, 50 and
2 respectively) prior to sending the data to TSNE but I still run out
of memory. For TSNE, n_components=2 and perplexity=40 (I've also tried
20). I've got 24GB of RAM on my 64-bit windows 7 machine. Should I try
a subsample of the dataset and if so, does anyone have a
recommendation on the size? Thanks!
-------- BPM Camp - Free Virtual Workshop May 6th at 10am PDT/1PM EDT
Develop your own process in accordance with the BPMN 2 standard Learn
Process modeling best practices with Bonita BPM through live exercises
Scikit-learn-general mailing list