Discussion:
[Scikit-learn-general] performance/scalability of NMF
Roberto Pagliari
2016-03-11 11:26:41 UTC
Permalink
Are there results about performance and scalability of scikit-learn implementation of NMF?

According to this thread on SO

http://stackoverflow.com/questions/18575846/non-negative-matrix-factorization-of-sparse-input

There are scalability issue. I would be interested to know the biggest dataset NMF can handle and what the memory footprint is.

Thank you,
Tom DLT
2016-03-14 12:29:27 UTC
Permalink
Hi Roberto,

In 0.17, we added a coordinate descent solver for NMF, which is more
efficient than previous projected gradient solver.

About performances for both dense and sparse data, I link you to this
pull-request for a better NMF benchmark
<https://github.com/scikit-learn/scikit-learn/pull/5779>.

About multithreading, the new solver releases the GIL (through cython code)
during a large part of the time.
The other main computational cost goes with numpy dot product, which
depends on your BLAS configuration.
Here is also a quick example for benchmarking multithreading
<https://gist.github.com/TomDLT/c1d560a510a41dd80ab6>:

Best,

Tom
Post by Roberto Pagliari
Are there results about performance and scalability of scikit-learn implementation of NMF?
According to this thread on SO
http://stackoverflow.com/questions/18575846/non-negative-matrix-factorization-of-sparse-input
There are scalability issue. I would be interested to know the biggest
dataset NMF can handle and what the memory footprint is.
Thank you,
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Loading...