[Scikit-learn-general] performance/scalability of NMF

Tom DLT

2016-03-14 12:29:27 UTC

Hi Roberto,

In 0.17, we added a coordinate descent solver for NMF, which is more
efficient than previous projected gradient solver.

About performances for both dense and sparse data, I link you to this
pull-request for a better NMF benchmark
<https://github.com/scikit-learn/scikit-learn/pull/5779>.

About multithreading, the new solver releases the GIL (through cython code)
during a large part of the time.
The other main computational cost goes with numpy dot product, which
depends on your BLAS configuration.
Here is also a quick example for benchmarking multithreading
<https://gist.github.com/TomDLT/c1d560a510a41dd80ab6>:

Best,

Tom

Post by Roberto Pagliari
Are there results about performance and scalability of scikit-learn implementation of NMF?
According to this thread on SO
http://stackoverflow.com/questions/18575846/non-negative-matrix-factorization-of-sparse-input
There are scalability issue. I would be interested to know the biggest
dataset NMF can handle and what the memory footprint is.
Thank you,
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general