Discussion:
[Scikit-learn-general] How you free up memory or handle it while fitting/cross-validating model in Scikitlearn?
muhammad waseem
2016-02-12 16:35:05 UTC
Permalink
Hi,

I am trying to fit my model using regression trees but the problem is, it
consumes a lot of RAM, which makes my code unresponsive. By looking at
different forums and platforms, I think this is a common problem. I was
wondering, how you free up memory or what are the best ways to run the
fitting process/cross-validation without running out of memory? This
problem is mostly with all regression trees (I think with other ML
algorithms as well). Shall I try to run without n_job=-1 and use some other
value (e.g. n_jobs=10) in cross_validation?

Thanks
Kindest Regards
Waseem
Sebastian Raschka
2016-02-12 16:42:49 UTC
Permalink
Hi, Waseem,
I think lowering the value of n_jobs would help; as far as I know, each process get a copy of the data? Just stumbled upon spark-sklearn a few days ago, maybe that could help as well:

https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html

When I understand correctly, the data is still copied, but here, each node gets a copy instead of one machine with many copies.
Hi,
I am trying to fit my model using regression trees but the problem is, it consumes a lot of RAM, which makes my code unresponsive. By looking at different forums and platforms, I think this is a common problem. I was wondering, how you free up memory or what are the best ways to run the fitting process/cross-validation without running out of memory? This problem is mostly with all regression trees (I think with other ML algorithms as well). Shall I try to run without n_job=-1 and use some other value (e.g. n_jobs=10) in cross_validation?
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Manoj Kumar
2016-02-12 17:19:21 UTC
Permalink
Hi Sebastian,

This is true but only if the data is less than 1M. After that it is
memmapped to a temp folder and is shared by all processes (
https://pythonhosted.org/joblib/parallel.html#working-with-numerical-data-in-shared-memory-memmaping
)

You can try varying "max_nbytes" parameter wherever Parallel is called in
the regression tress to trigger memmap conversion even with smaller size of
data and prevent duplication of data across all processes.
Post by Sebastian Raschka
Hi, Waseem,
I think lowering the value of n_jobs would help; as far as I know, each
process get a copy of the data? Just stumbled upon spark-sklearn a few days
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
When I understand correctly, the data is still copied, but here, each node
gets a copy instead of one machine with many copies.
Post by muhammad waseem
Hi,
I am trying to fit my model using regression trees but the problem is,
it consumes a lot of RAM, which makes my code unresponsive. By looking at
different forums and platforms, I think this is a common problem. I was
wondering, how you free up memory or what are the best ways to run the
fitting process/cross-validation without running out of memory? This
problem is mostly with all regression trees (I think with other ML
algorithms as well). Shall I try to run without n_job=-1 and use some other
value (e.g. n_jobs=10) in cross_validation?
Post by muhammad waseem
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Post by muhammad waseem
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Manoj,
http://github.com/MechCoder
muhammad waseem
2016-02-12 17:29:10 UTC
Permalink
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this
affect the results and time it takes to run cross_validation, grid_search
etc?

Thanks
Kindest Regards
Waseem
Post by Sebastian Raschka
Hi, Waseem,
I think lowering the value of n_jobs would help; as far as I know, each
process get a copy of the data? Just stumbled upon spark-sklearn a few days
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
When I understand correctly, the data is still copied, but here, each node
gets a copy instead of one machine with many copies.
Post by muhammad waseem
Hi,
I am trying to fit my model using regression trees but the problem is,
it consumes a lot of RAM, which makes my code unresponsive. By looking at
different forums and platforms, I think this is a common problem. I was
wondering, how you free up memory or what are the best ways to run the
fitting process/cross-validation without running out of memory? This
problem is mostly with all regression trees (I think with other ML
algorithms as well). Shall I try to run without n_job=-1 and use some other
value (e.g. n_jobs=10) in cross_validation?
Post by muhammad waseem
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Post by muhammad waseem
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
muhammad waseem
2016-02-12 17:30:22 UTC
Permalink
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this
affect the results and time it takes to run cross_validation, grid_search
etc?
@Sebastian: Will the Spark implication will also improve the memory use or
just the CPU?


Thanks
Kindest Regards
Post by muhammad waseem
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this
affect the results and time it takes to run cross_validation, grid_search
etc?
Thanks
Kindest Regards
Waseem
Post by Sebastian Raschka
Hi, Waseem,
I think lowering the value of n_jobs would help; as far as I know, each
process get a copy of the data? Just stumbled upon spark-sklearn a few days
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
When I understand correctly, the data is still copied, but here, each
node gets a copy instead of one machine with many copies.
Post by muhammad waseem
Hi,
I am trying to fit my model using regression trees but the problem is,
it consumes a lot of RAM, which makes my code unresponsive. By looking at
different forums and platforms, I think this is a common problem. I was
wondering, how you free up memory or what are the best ways to run the
fitting process/cross-validation without running out of memory? This
problem is mostly with all regression trees (I think with other ML
algorithms as well). Shall I try to run without n_job=-1 and use some other
value (e.g. n_jobs=10) in cross_validation?
Post by muhammad waseem
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Post by muhammad waseem
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Dr Muhammad Waseem Ahmad
Research Associate,
BRE Center for Sustainable Construction,

School of Engineering,

Cardiff University,

Cardiff, UK.
Sebastian Raschka
2016-02-12 18:40:38 UTC
Permalink
Thanks for the note, Manoj, didn't know that!

@muhammad So if there's no duplication of data across all processes, I guess that the you would also run into troubles with n_jobs=1. But just to make sure that data duplication is not an issue, could you try running it with n_jobs=1? In this case, probably only a smaller data set or machine with larger memory would help. Here, I'd probably think about using Spark's MLlib to deal with this particular dataset.
Post by muhammad waseem
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this affect the results and time it takes to run cross_validation, grid_search etc?
@Sebastian: Will the Spark implication will also improve the memory use or just the CPU?
Thanks
Kindest Regards
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this affect the results and time it takes to run cross_validation, grid_search etc?
Thanks
Kindest Regards
Waseem
Hi, Waseem,
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html <https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html>
When I understand correctly, the data is still copied, but here, each node gets a copy instead of one machine with many copies.
Hi,
I am trying to fit my model using regression trees but the problem is, it consumes a lot of RAM, which makes my code unresponsive. By looking at different forums and platforms, I think this is a common problem. I was wondering, how you free up memory or what are the best ways to run the fitting process/cross-validation without running out of memory? This problem is mostly with all regression trees (I think with other ML algorithms as well). Shall I try to run without n_job=-1 and use some other value (e.g. n_jobs=10) in cross_validation?
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________ <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________>
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140>
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
--
Dr Muhammad Waseem Ahmad
Research Associate,
BRE Center for Sustainable Construction,
School of Engineering,
Cardiff University,
Cardiff, UK.
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
muhammad waseem
2016-02-12 19:57:30 UTC
Permalink
@Sebastian: I tried with n_jobs=10 (total is equal to 12) and it still
created the same problem. I could try running it by using n_jobs=1 but it
would be so slow that it will take ages to complete. The machine has 32GB
RAM and it started using Swap memory after consuming full RAM.

Is there a way to tackle or you really think that all this k-fold cross
validation, training should be done using Spark's MLib?

Thanks
Regards
Waseem
Post by Sebastian Raschka
Thanks for the note, Manoj, didn't know that!
@muhammad So if there's no duplication of data across all processes, I
guess that the you would also run into troubles with n_jobs=1. But just to
make sure that data duplication is not an issue, could you try running it
with n_jobs=1? In this case, probably only a smaller data set or machine
with larger memory would help. Here, I'd probably think about using Spark's
MLlib to deal with this particular dataset.
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this
affect the results and time it takes to run cross_validation, grid_search
etc?
@Sebastian: Will the Spark implication will also improve the memory use or just the CPU?
Thanks
Kindest Regards
Post by muhammad waseem
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this
affect the results and time it takes to run cross_validation, grid_search
etc?
Thanks
Kindest Regards
Waseem
Post by Sebastian Raschka
Hi, Waseem,
I think lowering the value of n_jobs would help; as far as I know, each
process get a copy of the data? Just stumbled upon spark-sklearn a few days
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
When I understand correctly, the data is still copied, but here, each
node gets a copy instead of one machine with many copies.
On Feb 12, 2016, at 11:35 AM, muhammad waseem <
Hi,
I am trying to fit my model using regression trees but the problem is,
it consumes a lot of RAM, which makes my code unresponsive. By looking at
different forums and platforms, I think this is a common problem. I was
wondering, how you free up memory or what are the best ways to run the
fitting process/cross-validation without running out of memory? This
problem is mostly with all regression trees (I think with other ML
algorithms as well). Shall I try to run without n_job=-1 and use some other
value (e.g. n_jobs=10) in cross_validation?
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Sebastian Raschka
2016-02-12 20:32:50 UTC
Permalink
I'd suggest trying n_jobs=1 and check if swap memory is used (you don't have to run it until completion). If this runs fine without swap, we can work further from there.

Sent from my iPhone
@Sebastian: I tried with n_jobs=10 (total is equal to 12) and it still created the same problem. I could try running it by using n_jobs=1 but it would be so slow that it will take ages to complete. The machine has 32GB RAM and it started using Swap memory after consuming full RAM.
Is there a way to tackle or you really think that all this k-fold cross validation, training should be done using Spark's MLib?
Thanks
Regards
Waseem
Post by Sebastian Raschka
Thanks for the note, Manoj, didn't know that!
@muhammad So if there's no duplication of data across all processes, I guess that the you would also run into troubles with n_jobs=1. But just to make sure that data duplication is not an issue, could you try running it with n_jobs=1? In this case, probably only a smaller data set or machine with larger memory would help. Here, I'd probably think about using Spark's MLlib to deal with this particular dataset.
Post by muhammad waseem
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this affect the results and time it takes to run cross_validation, grid_search etc?
@Sebastian: Will the Spark implication will also improve the memory use or just the CPU?
Thanks
Kindest Regards
Post by muhammad waseem
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this affect the results and time it takes to run cross_validation, grid_search etc?
Thanks
Kindest Regards
Waseem
Post by Sebastian Raschka
Hi, Waseem,
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
When I understand correctly, the data is still copied, but here, each node gets a copy instead of one machine with many copies.
Hi,
I am trying to fit my model using regression trees but the problem is, it consumes a lot of RAM, which makes my code unresponsive. By looking at different forums and platforms, I think this is a common problem. I was wondering, how you free up memory or what are the best ways to run the fitting process/cross-validation without running out of memory? This problem is mostly with all regression trees (I think with other ML algorithms as well). Shall I try to run without n_job=-1 and use some other value (e.g. n_jobs=10) in cross_validation?
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Jacob Schreiber
2016-02-12 21:58:06 UTC
Permalink
I don't think that the data is copied for tree based classifiers. It uses
the threading backend, so each thread should be sharing memory.
Post by Sebastian Raschka
I'd suggest trying n_jobs=1 and check if swap memory is used (you don't
have to run it until completion). If this runs fine without swap, we can
work further from there.
Sent from my iPhone
@Sebastian: I tried with n_jobs=10 (total is equal to 12) and it still
created the same problem. I could try running it by using n_jobs=1 but it
would be so slow that it will take ages to complete. The machine has 32GB
RAM and it started using Swap memory after consuming full RAM.
Is there a way to tackle or you really think that all this k-fold cross
validation, training should be done using Spark's MLib?
Thanks
Regards
Waseem
Post by Sebastian Raschka
Thanks for the note, Manoj, didn't know that!
@muhammad So if there's no duplication of data across all processes, I
guess that the you would also run into troubles with n_jobs=1. But just to
make sure that data duplication is not an issue, could you try running it
with n_jobs=1? In this case, probably only a smaller data set or machine
with larger memory would help. Here, I'd probably think about using Spark's
MLlib to deal with this particular dataset.
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this
affect the results and time it takes to run cross_validation, grid_search
etc?
@Sebastian: Will the Spark implication will also improve the memory use or just the CPU?
Thanks
Kindest Regards
On Fri, Feb 12, 2016 at 5:29 PM, muhammad waseem <
Post by muhammad waseem
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this
affect the results and time it takes to run cross_validation, grid_search
etc?
Thanks
Kindest Regards
Waseem
Post by Sebastian Raschka
Hi, Waseem,
I think lowering the value of n_jobs would help; as far as I know, each
process get a copy of the data? Just stumbled upon spark-sklearn a few days
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
When I understand correctly, the data is still copied, but here, each
node gets a copy instead of one machine with many copies.
On Feb 12, 2016, at 11:35 AM, muhammad waseem <
Hi,
I am trying to fit my model using regression trees but the problem
is, it consumes a lot of RAM, which makes my code unresponsive. By looking
at different forums and platforms, I think this is a common problem. I was
wondering, how you free up memory or what are the best ways to run the
fitting process/cross-validation without running out of memory? This
problem is mostly with all regression trees (I think with other ML
algorithms as well). Shall I try to run without n_job=-1 and use some other
value (e.g. n_jobs=10) in cross_validation?
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
muhammad waseem
2016-02-15 14:37:09 UTC
Permalink
@Sebastian: I have tried to run cross_validation by using n_jobs=1 and it
did not use SWAP memory, even the RAM usage was quite low (maximum 12%).
However, this will take a longer time to finish. Any idea what to try now?

Thanks
Kindest Regards
Waseem
Post by Jacob Schreiber
I don't think that the data is copied for tree based classifiers. It uses
the threading backend, so each thread should be sharing memory.
Post by Sebastian Raschka
I'd suggest trying n_jobs=1 and check if swap memory is used (you don't
have to run it until completion). If this runs fine without swap, we can
work further from there.
Sent from my iPhone
@Sebastian: I tried with n_jobs=10 (total is equal to 12) and it still
created the same problem. I could try running it by using n_jobs=1 but it
would be so slow that it will take ages to complete. The machine has 32GB
RAM and it started using Swap memory after consuming full RAM.
Is there a way to tackle or you really think that all this k-fold cross
validation, training should be done using Spark's MLib?
Thanks
Regards
Waseem
Post by Sebastian Raschka
Thanks for the note, Manoj, didn't know that!
@muhammad So if there's no duplication of data across all processes, I
guess that the you would also run into troubles with n_jobs=1. But just to
make sure that data duplication is not an issue, could you try running it
with n_jobs=1? In this case, probably only a smaller data set or machine
with larger memory would help. Here, I'd probably think about using Spark's
MLlib to deal with this particular dataset.
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this
affect the results and time it takes to run cross_validation, grid_search
etc?
@Sebastian: Will the Spark implication will also improve the memory use
or just the CPU?
Thanks
Kindest Regards
On Fri, Feb 12, 2016 at 5:29 PM, muhammad waseem <
Post by muhammad waseem
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this
affect the results and time it takes to run cross_validation, grid_search
etc?
Thanks
Kindest Regards
Waseem
On Fri, Feb 12, 2016 at 4:42 PM, Sebastian Raschka <
Post by Sebastian Raschka
Hi, Waseem,
I think lowering the value of n_jobs would help; as far as I know,
each process get a copy of the data? Just stumbled upon spark-sklearn a few
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
When I understand correctly, the data is still copied, but here, each
node gets a copy instead of one machine with many copies.
On Feb 12, 2016, at 11:35 AM, muhammad waseem <
Hi,
I am trying to fit my model using regression trees but the problem
is, it consumes a lot of RAM, which makes my code unresponsive. By looking
at different forums and platforms, I think this is a common problem. I was
wondering, how you free up memory or what are the best ways to run the
fitting process/cross-validation without running out of memory? This
problem is mostly with all regression trees (I think with other ML
algorithms as well). Shall I try to run without n_job=-1 and use some other
value (e.g. n_jobs=10) in cross_validation?
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application
Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Sebastian Raschka
2016-02-15 21:25:32 UTC
Permalink
Hm, unfortunately, that's what I thought -- sounds like a bug involved in joblib? Does someone has any ideas how to track this down?

@Waseem Can you also try n_jobs=2? Here, I'd expect that it
1) would use maybe 2 times the 12% plus a little bit extra if everything is working correctly with the multi-threading.
2) If you see something like ~30%, I'd say that there's an unnecessary copy made
3) If you see something like > 30% there would be a memory leak somewhere

I mentioned scenario 3, because I observed a very similar behavior once:
(see https://github.com/scikit-learn/scikit-learn/issues/3973)

"I made some weird observations that my GridSearches keep failing after a couple of hours and I initially couldn't figure out why. I monitored the memory usage then over time and saw that it it started with a few gigabytes (~6 Gb) and kept increasing until it crashed the node when it reached the max. 128 Gb the hardware can take. I was experimenting with random forests for classification of a large number of text documents. For simplicity -- to figure out what's going on -- I went back to naive Bayes.
...
After some experimentation, I finally found out that

gc.collect()
len(gc.get_objects()) # particularly this part!

in the for loop solves the problem and the memory usage stays constantly at 6.5 Gb over the run time of ~10 hours.
@Sebastian: I have tried to run cross_validation by using n_jobs=1 and it did not use SWAP memory, even the RAM usage was quite low (maximum 12%). However, this will take a longer time to finish. Any idea what to try now?
Thanks
Kindest Regards
Waseem
I don't think that the data is copied for tree based classifiers. It uses the threading backend, so each thread should be sharing memory.
I'd suggest trying n_jobs=1 and check if swap memory is used (you don't have to run it until completion). If this runs fine without swap, we can work further from there.
Sent from my iPhone
@Sebastian: I tried with n_jobs=10 (total is equal to 12) and it still created the same problem. I could try running it by using n_jobs=1 but it would be so slow that it will take ages to complete. The machine has 32GB RAM and it started using Swap memory after consuming full RAM.
Is there a way to tackle or you really think that all this k-fold cross validation, training should be done using Spark's MLib?
Thanks
Regards
Waseem
Thanks for the note, Manoj, didn't know that!
@muhammad So if there's no duplication of data across all processes, I guess that the you would also run into troubles with n_jobs=1. But just to make sure that data duplication is not an issue, could you try running it with n_jobs=1? In this case, probably only a smaller data set or machine with larger memory would help. Here, I'd probably think about using Spark's MLlib to deal with this particular dataset.
Post by muhammad waseem
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this affect the results and time it takes to run cross_validation, grid_search etc?
@Sebastian: Will the Spark implication will also improve the memory use or just the CPU?
Thanks
Kindest Regards
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this affect the results and time it takes to run cross_validation, grid_search etc?
Thanks
Kindest Regards
Waseem
Hi, Waseem,
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
When I understand correctly, the data is still copied, but here, each node gets a copy instead of one machine with many copies.
Hi,
I am trying to fit my model using regression trees but the problem is, it consumes a lot of RAM, which makes my code unresponsive. By looking at different forums and platforms, I think this is a common problem. I was wondering, how you free up memory or what are the best ways to run the fitting process/cross-validation without running out of memory? This problem is mostly with all regression trees (I think with other ML algorithms as well). Shall I try to run without n_job=-1 and use some other value (e.g. n_jobs=10) in cross_validation?
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
muhammad waseem
2016-02-17 19:25:36 UTC
Permalink
@Sebastian: I have tried running it by using n_jobs=2 and you were right it
uses around 27% of the RAM.
Does this mean I can only use max n_jobs=8 for my case (obviously this will
also depend on the number of estimators, more will require my RAM, is not
it?) or there is a bug?

Also, could you share the code for the way you tackled it? I have seen part
of it but is it possible to see full code?

Thanks for your time.

Regards
Waseem
Post by Sebastian Raschka
Hm, unfortunately, that's what I thought -- sounds like a bug involved in
joblib? Does someone has any ideas how to track this down?
@Waseem Can you also try n_jobs=2? Here, I'd expect that it
1) would use maybe 2 times the 12% plus a little bit extra if everything
is working correctly with the multi-threading.
2) If you see something like ~30%, I'd say that there's an unnecessary copy made
3) If you see something like > 30% there would be a memory leak somewhere
(see https://github.com/scikit-learn/scikit-learn/issues/3973)
"I made some weird observations that my GridSearches keep failing after a
couple of hours and I initially couldn't figure out why. I monitored the
memory usage then over time and saw that it it started with a few gigabytes
(~6 Gb) and kept increasing until it crashed the node when it reached the
max. 128 Gb the hardware can take. I was experimenting with random forests
for classification of a large number of text documents. For simplicity --
to figure out what's going on -- I went back to naive Bayes.
...
After some experimentation, I finally found out that
gc.collect()
len(gc.get_objects()) # particularly this part!
in the for loop solves the problem and the memory usage stays constantly
at 6.5 Gb over the run time of ~10 hours.
Post by muhammad waseem
@Sebastian: I have tried to run cross_validation by using n_jobs=1 and
it did not use SWAP memory, even the RAM usage was quite low (maximum 12%).
However, this will take a longer time to finish. Any idea what to try now?
Post by muhammad waseem
Thanks
Kindest Regards
Waseem
On Fri, Feb 12, 2016 at 9:58 PM, Jacob Schreiber <
I don't think that the data is copied for tree based classifiers. It
uses the threading backend, so each thread should be sharing memory.
Post by muhammad waseem
On Fri, Feb 12, 2016 at 12:32 PM, Sebastian Raschka <
I'd suggest trying n_jobs=1 and check if swap memory is used (you don't
have to run it until completion). If this runs fine without swap, we can
work further from there.
Post by muhammad waseem
Sent from my iPhone
Post by muhammad waseem
@Sebastian: I tried with n_jobs=10 (total is equal to 12) and it still
created the same problem. I could try running it by using n_jobs=1 but it
would be so slow that it will take ages to complete. The machine has 32GB
RAM and it started using Swap memory after consuming full RAM.
Post by muhammad waseem
Post by muhammad waseem
Is there a way to tackle or you really think that all this k-fold cross
validation, training should be done using Spark's MLib?
Post by muhammad waseem
Post by muhammad waseem
Thanks
Regards
Waseem
On Fri, Feb 12, 2016 at 6:40 PM, Sebastian Raschka <
Thanks for the note, Manoj, didn't know that!
@muhammad So if there's no duplication of data across all processes, I
guess that the you would also run into troubles with n_jobs=1. But just to
make sure that data duplication is not an issue, could you try running it
with n_jobs=1? In this case, probably only a smaller data set or machine
with larger memory would help. Here, I'd probably think about using Spark's
MLlib to deal with this particular dataset.
Post by muhammad waseem
Post by muhammad waseem
On Feb 12, 2016, at 12:30 PM, muhammad waseem <
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this
affect the results and time it takes to run cross_validation, grid_search
etc?
Post by muhammad waseem
Post by muhammad waseem
@Sebastian: Will the Spark implication will also improve the memory
use or just the CPU?
Post by muhammad waseem
Post by muhammad waseem
Thanks
Kindest Regards
On Fri, Feb 12, 2016 at 5:29 PM, muhammad waseem <
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this
affect the results and time it takes to run cross_validation, grid_search
etc?
Post by muhammad waseem
Post by muhammad waseem
Thanks
Kindest Regards
Waseem
On Fri, Feb 12, 2016 at 4:42 PM, Sebastian Raschka <
Hi, Waseem,
I think lowering the value of n_jobs would help; as far as I know,
each process get a copy of the data? Just stumbled upon spark-sklearn a few
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
Post by muhammad waseem
Post by muhammad waseem
When I understand correctly, the data is still copied, but here, each
node gets a copy instead of one machine with many copies.
Post by muhammad waseem
Post by muhammad waseem
On Feb 12, 2016, at 11:35 AM, muhammad waseem <
Hi,
I am trying to fit my model using regression trees but the problem
is, it consumes a lot of RAM, which makes my code unresponsive. By looking
at different forums and platforms, I think this is a common problem. I was
wondering, how you free up memory or what are the best ways to run the
fitting process/cross-validation without running out of memory? This
problem is mostly with all regression trees (I think with other ML
algorithms as well). Shall I try to run without n_job=-1 and use some other
value (e.g. n_jobs=10) in cross_validation?
Post by muhammad waseem
Post by muhammad waseem
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Post by muhammad waseem
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application
Performance
Post by muhammad waseem
Post by muhammad waseem
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Post by muhammad waseem
Post by muhammad waseem
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by muhammad waseem
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by muhammad waseem
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Post by muhammad waseem
Post by muhammad waseem
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by muhammad waseem
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by muhammad waseem
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Post by muhammad waseem
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Sebastian Raschka
2016-02-17 19:36:37 UTC
Permalink
Hm, I think if the others are right and the data set shouldn't be copied for each process, I guess that's a bug. Maybe you could create a reproducible example and post it on the issue tracker?
Also, could you share the code for the way you tackled it? I have seen part of it but is it possible to see full code?
I have it here:
https://github.com/rasbt/bugreport/tree/master/scikit-learn/gridsearch_memory

(but be aware that it's >1 year ago and I haven't tested it again since then)
@Sebastian: I have tried running it by using n_jobs=2 and you were right it uses around 27% of the RAM.
Does this mean I can only use max n_jobs=8 for my case (obviously this will also depend on the number of estimators, more will require my RAM, is not it?) or there is a bug?
Also, could you share the code for the way you tackled it? I have seen part of it but is it possible to see full code?
Thanks for your time.
Regards
Waseem
Hm, unfortunately, that's what I thought -- sounds like a bug involved in joblib? Does someone has any ideas how to track this down?
@Waseem Can you also try n_jobs=2? Here, I'd expect that it
1) would use maybe 2 times the 12% plus a little bit extra if everything is working correctly with the multi-threading.
2) If you see something like ~30%, I'd say that there's an unnecessary copy made
3) If you see something like > 30% there would be a memory leak somewhere
(see https://github.com/scikit-learn/scikit-learn/issues/3973 <https://github.com/scikit-learn/scikit-learn/issues/3973>)
"I made some weird observations that my GridSearches keep failing after a couple of hours and I initially couldn't figure out why. I monitored the memory usage then over time and saw that it it started with a few gigabytes (~6 Gb) and kept increasing until it crashed the node when it reached the max. 128 Gb the hardware can take. I was experimenting with random forests for classification of a large number of text documents. For simplicity -- to figure out what's going on -- I went back to naive Bayes.
...
After some experimentation, I finally found out that
gc.collect()
len(gc.get_objects()) # particularly this part!
in the for loop solves the problem and the memory usage stays constantly at 6.5 Gb over the run time of ~10 hours.
@Sebastian: I have tried to run cross_validation by using n_jobs=1 and it did not use SWAP memory, even the RAM usage was quite low (maximum 12%). However, this will take a longer time to finish. Any idea what to try now?
Thanks
Kindest Regards
Waseem
I don't think that the data is copied for tree based classifiers. It uses the threading backend, so each thread should be sharing memory.
I'd suggest trying n_jobs=1 and check if swap memory is used (you don't have to run it until completion). If this runs fine without swap, we can work further from there.
Sent from my iPhone
@Sebastian: I tried with n_jobs=10 (total is equal to 12) and it still created the same problem. I could try running it by using n_jobs=1 but it would be so slow that it will take ages to complete. The machine has 32GB RAM and it started using Swap memory after consuming full RAM.
Is there a way to tackle or you really think that all this k-fold cross validation, training should be done using Spark's MLib?
Thanks
Regards
Waseem
Thanks for the note, Manoj, didn't know that!
@muhammad So if there's no duplication of data across all processes, I guess that the you would also run into troubles with n_jobs=1. But just to make sure that data duplication is not an issue, could you try running it with n_jobs=1? In this case, probably only a smaller data set or machine with larger memory would help. Here, I'd probably think about using Spark's MLlib to deal with this particular dataset.
Post by muhammad waseem
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this affect the results and time it takes to run cross_validation, grid_search etc?
@Sebastian: Will the Spark implication will also improve the memory use or just the CPU?
Thanks
Kindest Regards
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this affect the results and time it takes to run cross_validation, grid_search etc?
Thanks
Kindest Regards
Waseem
Hi, Waseem,
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html <https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html>
When I understand correctly, the data is still copied, but here, each node gets a copy instead of one machine with many copies.
Hi,
I am trying to fit my model using regression trees but the problem is, it consumes a lot of RAM, which makes my code unresponsive. By looking at different forums and platforms, I think this is a common problem. I was wondering, how you free up memory or what are the best ways to run the fitting process/cross-validation without running out of memory? This problem is mostly with all regression trees (I think with other ML algorithms as well). Shall I try to run without n_job=-1 and use some other value (e.g. n_jobs=10) in cross_validation?
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________ <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________>
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140>
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________ <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________>
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140>
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140>
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140>
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140>
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________ <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________>
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140 <http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140>
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2016-02-22 17:18:25 UTC
Permalink
Post by muhammad waseem
@Sebastian: I have tried running it by using n_jobs=2 and you were
right it uses around 27% of the RAM.
Does this mean I can only use max n_jobs=8 for my case (obviously this
will also depend on the number of estimators, more will require my
RAM, is not it?) or there is a bug?
What is the dtype of your data?
Trees work on 32bit float, right? So if your data is 64bit float or
anything else, it will be copied once for each parallel loop, right?
Jacob Schreiber
2016-02-22 18:38:30 UTC
Permalink
I think trees are 32 bit for X and 64 bit for y.
Post by Andreas Mueller
Post by muhammad waseem
@Sebastian: I have tried running it by using n_jobs=2 and you were
right it uses around 27% of the RAM.
Does this mean I can only use max n_jobs=8 for my case (obviously this
will also depend on the number of estimators, more will require my
RAM, is not it?) or there is a bug?
What is the dtype of your data?
Trees work on 32bit float, right? So if your data is 64bit float or
anything else, it will be copied once for each parallel loop, right?
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Sebastian Raschka
2016-02-17 19:40:58 UTC
Permalink
@Waseem Oh, wait, I just see that we already have an open issue for that, please see: https://github.com/scikit-learn/scikit-learn/issues/3973 Would be great if you could add to the discussion there. Meanwhile, I will try to run my code again in the next few days to check if this bug still persists.
Post by Sebastian Raschka
Hm, unfortunately, that's what I thought -- sounds like a bug involved in joblib? Does someone has any ideas how to track this down?
@Waseem Can you also try n_jobs=2? Here, I'd expect that it
1) would use maybe 2 times the 12% plus a little bit extra if everything is working correctly with the multi-threading.
2) If you see something like ~30%, I'd say that there's an unnecessary copy made
3) If you see something like > 30% there would be a memory leak somewhere
(see https://github.com/scikit-learn/scikit-learn/issues/3973)
"I made some weird observations that my GridSearches keep failing after a couple of hours and I initially couldn't figure out why. I monitored the memory usage then over time and saw that it it started with a few gigabytes (~6 Gb) and kept increasing until it crashed the node when it reached the max. 128 Gb the hardware can take. I was experimenting with random forests for classification of a large number of text documents. For simplicity -- to figure out what's going on -- I went back to naive Bayes.
...
After some experimentation, I finally found out that
gc.collect()
len(gc.get_objects()) # particularly this part!
in the for loop solves the problem and the memory usage stays constantly at 6.5 Gb over the run time of ~10 hours.
@Sebastian: I have tried to run cross_validation by using n_jobs=1 and it did not use SWAP memory, even the RAM usage was quite low (maximum 12%). However, this will take a longer time to finish. Any idea what to try now?
Thanks
Kindest Regards
Waseem
I don't think that the data is copied for tree based classifiers. It uses the threading backend, so each thread should be sharing memory.
I'd suggest trying n_jobs=1 and check if swap memory is used (you don't have to run it until completion). If this runs fine without swap, we can work further from there.
Sent from my iPhone
@Sebastian: I tried with n_jobs=10 (total is equal to 12) and it still created the same problem. I could try running it by using n_jobs=1 but it would be so slow that it will take ages to complete. The machine has 32GB RAM and it started using Swap memory after consuming full RAM.
Is there a way to tackle or you really think that all this k-fold cross validation, training should be done using Spark's MLib?
Thanks
Regards
Waseem
Thanks for the note, Manoj, didn't know that!
@muhammad So if there's no duplication of data across all processes, I guess that the you would also run into troubles with n_jobs=1. But just to make sure that data duplication is not an issue, could you try running it with n_jobs=1? In this case, probably only a smaller data set or machine with larger memory would help. Here, I'd probably think about using Spark's MLlib to deal with this particular dataset.
Post by muhammad waseem
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this affect the results and time it takes to run cross_validation, grid_search etc?
@Sebastian: Will the Spark implication will also improve the memory use or just the CPU?
Thanks
Kindest Regards
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this affect the results and time it takes to run cross_validation, grid_search etc?
Thanks
Kindest Regards
Waseem
Hi, Waseem,
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
When I understand correctly, the data is still copied, but here, each node gets a copy instead of one machine with many copies.
Hi,
I am trying to fit my model using regression trees but the problem is, it consumes a lot of RAM, which makes my code unresponsive. By looking at different forums and platforms, I think this is a common problem. I was wondering, how you free up memory or what are the best ways to run the fitting process/cross-validation without running out of memory? This problem is mostly with all regression trees (I think with other ML algorithms as well). Shall I try to run without n_job=-1 and use some other value (e.g. n_jobs=10) in cross_validation?
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
muhammad waseem
2016-02-17 19:53:10 UTC
Permalink
@Sebastian: I will add in the discussion, it looks like it is not very
active :(

@Your code: Is this the full code or some part of it is missing? I can see
...
after

for p2 in parameterset2:

which means there is some thing missing there, no?

Thanks
Post by Sebastian Raschka
@Waseem Oh, wait, I just see that we already have an open issue for that,
please see: https://github.com/scikit-learn/scikit-learn/issues/3973
Would be great if you could add to the discussion there. Meanwhile, I will
try to run my code again in the next few days to check if this bug still
persists.
Post by Sebastian Raschka
Hm, unfortunately, that's what I thought -- sounds like a bug involved
in joblib? Does someone has any ideas how to track this down?
Post by Sebastian Raschka
@Waseem Can you also try n_jobs=2? Here, I'd expect that it
1) would use maybe 2 times the 12% plus a little bit extra if
everything is working correctly with the multi-threading.
Post by Sebastian Raschka
2) If you see something like ~30%, I'd say that there's an unnecessary
copy made
Post by Sebastian Raschka
3) If you see something like > 30% there would be a memory leak somewhere
(see https://github.com/scikit-learn/scikit-learn/issues/3973)
"I made some weird observations that my GridSearches keep failing after
a couple of hours and I initially couldn't figure out why. I monitored the
memory usage then over time and saw that it it started with a few gigabytes
(~6 Gb) and kept increasing until it crashed the node when it reached the
max. 128 Gb the hardware can take. I was experimenting with random forests
for classification of a large number of text documents. For simplicity --
to figure out what's going on -- I went back to naive Bayes.
Post by Sebastian Raschka
...
After some experimentation, I finally found out that
gc.collect()
len(gc.get_objects()) # particularly this part!
in the for loop solves the problem and the memory usage stays constantly
at 6.5 Gb over the run time of ~10 hours.
Post by Sebastian Raschka
Post by muhammad waseem
@Sebastian: I have tried to run cross_validation by using n_jobs=1 and
it did not use SWAP memory, even the RAM usage was quite low (maximum 12%).
However, this will take a longer time to finish. Any idea what to try now?
Post by Sebastian Raschka
Post by muhammad waseem
Thanks
Kindest Regards
Waseem
On Fri, Feb 12, 2016 at 9:58 PM, Jacob Schreiber <
I don't think that the data is copied for tree based classifiers. It
uses the threading backend, so each thread should be sharing memory.
Post by Sebastian Raschka
Post by muhammad waseem
On Fri, Feb 12, 2016 at 12:32 PM, Sebastian Raschka <
I'd suggest trying n_jobs=1 and check if swap memory is used (you don't
have to run it until completion). If this runs fine without swap, we can
work further from there.
Post by Sebastian Raschka
Post by muhammad waseem
Sent from my iPhone
Post by muhammad waseem
@Sebastian: I tried with n_jobs=10 (total is equal to 12) and it still
created the same problem. I could try running it by using n_jobs=1 but it
would be so slow that it will take ages to complete. The machine has 32GB
RAM and it started using Swap memory after consuming full RAM.
Post by Sebastian Raschka
Post by muhammad waseem
Post by muhammad waseem
Is there a way to tackle or you really think that all this k-fold
cross validation, training should be done using Spark's MLib?
Post by Sebastian Raschka
Post by muhammad waseem
Post by muhammad waseem
Thanks
Regards
Waseem
On Fri, Feb 12, 2016 at 6:40 PM, Sebastian Raschka <
Thanks for the note, Manoj, didn't know that!
@muhammad So if there's no duplication of data across all processes, I
guess that the you would also run into troubles with n_jobs=1. But just to
make sure that data duplication is not an issue, could you try running it
with n_jobs=1? In this case, probably only a smaller data set or machine
with larger memory would help. Here, I'd probably think about using Spark's
MLlib to deal with this particular dataset.
Post by Sebastian Raschka
Post by muhammad waseem
Post by muhammad waseem
On Feb 12, 2016, at 12:30 PM, muhammad waseem <
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will
this affect the results and time it takes to run cross_validation,
grid_search etc?
Post by Sebastian Raschka
Post by muhammad waseem
Post by muhammad waseem
@Sebastian: Will the Spark implication will also improve the memory
use or just the CPU?
Post by Sebastian Raschka
Post by muhammad waseem
Post by muhammad waseem
Thanks
Kindest Regards
On Fri, Feb 12, 2016 at 5:29 PM, muhammad waseem <
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will
this affect the results and time it takes to run cross_validation,
grid_search etc?
Post by Sebastian Raschka
Post by muhammad waseem
Post by muhammad waseem
Thanks
Kindest Regards
Waseem
On Fri, Feb 12, 2016 at 4:42 PM, Sebastian Raschka <
Hi, Waseem,
I think lowering the value of n_jobs would help; as far as I know,
each process get a copy of the data? Just stumbled upon spark-sklearn a few
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
Post by Sebastian Raschka
Post by muhammad waseem
Post by muhammad waseem
When I understand correctly, the data is still copied, but here, each
node gets a copy instead of one machine with many copies.
Post by Sebastian Raschka
Post by muhammad waseem
Post by muhammad waseem
On Feb 12, 2016, at 11:35 AM, muhammad waseem <
Hi,
I am trying to fit my model using regression trees but the problem
is, it consumes a lot of RAM, which makes my code unresponsive. By looking
at different forums and platforms, I think this is a common problem. I was
wondering, how you free up memory or what are the best ways to run the
fitting process/cross-validation without running out of memory? This
problem is mostly with all regression trees (I think with other ML
algorithms as well). Shall I try to run without n_job=-1 and use some other
value (e.g. n_jobs=10) in cross_validation?
Post by Sebastian Raschka
Post by muhammad waseem
Post by muhammad waseem
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Post by Sebastian Raschka
Post by muhammad waseem
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application
Performance
Post by Sebastian Raschka
Post by muhammad waseem
Post by muhammad waseem
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Post by Sebastian Raschka
Post by muhammad waseem
Post by muhammad waseem
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Sebastian Raschka
Post by muhammad waseem
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Sebastian Raschka
Post by muhammad waseem
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Post by Sebastian Raschka
Post by muhammad waseem
Post by muhammad waseem
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Sebastian Raschka
Post by muhammad waseem
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Sebastian Raschka
Post by muhammad waseem
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Sebastian Raschka
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Sebastian Raschka
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Sebastian Raschka
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Post by Sebastian Raschka
Post by muhammad waseem
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Sebastian Raschka
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Sebastian Raschka
2016-02-18 18:41:00 UTC
Permalink
Post by muhammad waseem
@Your code: Is this the full code or some part of it is missing? I can see
...
after
Yes, there is part of it missing -- I removed it for clarity. It's essentially just a whole bunch of nested for-loops (bad-style anyway, but that was just a quick work-around). It's basically just iterating over different parameters sets to do the grid-search "manually."

Btw I just saw that scikit-learn 0.17.1 came out today including an updated version of joblib. Maybe it's worth a try to see if it may solve the problem?
Post by muhammad waseem
Post by muhammad waseem
@Your code: Is this the full code or some part of it is missing? I can see
...
after
Yes, there is part of it missing -- I removed it for clarity. It's essentially just a whole bunch of nested for-loops (bad-style anyway, but that was just a quick work-around). It's basically just iterating over different parameters sets to do the grid-search "manually."
Btw I just saw that scikit-learn 0.17.1 came out today including an updated version of joblib. Maybe it's worth a try to see if it may solve the problem?
Post by muhammad waseem
@Sebastian: I will add in the discussion, it looks like it is not very active :(
@Your code: Is this the full code or some part of it is missing? I can see
...
after
for p2 in
which means there is some thing missing there, no?
Thanks
@Waseem Oh, wait, I just see that we already have an open issue for that, please see: https://github.com/scikit-learn/scikit-learn/issues/3973 Would be great if you could add to the discussion there. Meanwhile, I will try to run my code again in the next few days to check if this bug still persists.
Post by Sebastian Raschka
Hm, unfortunately, that's what I thought -- sounds like a bug involved in joblib? Does someone has any ideas how to track this down?
@Waseem Can you also try n_jobs=2? Here, I'd expect that it
1) would use maybe 2 times the 12% plus a little bit extra if everything is working correctly with the multi-threading.
2) If you see something like ~30%, I'd say that there's an unnecessary copy made
3) If you see something like > 30% there would be a memory leak somewhere
(see https://github.com/scikit-learn/scikit-learn/issues/3973)
"I made some weird observations that my GridSearches keep failing after a couple of hours and I initially couldn't figure out why. I monitored the memory usage then over time and saw that it it started with a few gigabytes (~6 Gb) and kept increasing until it crashed the node when it reached the max. 128 Gb the hardware can take. I was experimenting with random forests for classification of a large number of text documents. For simplicity -- to figure out what's going on -- I went back to naive Bayes.
...
After some experimentation, I finally found out that
gc.collect()
len(gc.get_objects()) # particularly this part!
in the for loop solves the problem and the memory usage stays constantly at 6.5 Gb over the run time of ~10 hours.
@Sebastian: I have tried to run cross_validation by using n_jobs=1 and it did not use SWAP memory, even the RAM usage was quite low (maximum 12%). However, this will take a longer time to finish. Any idea what to try now?
Thanks
Kindest Regards
Waseem
I don't think that the data is copied for tree based classifiers. It uses the threading backend, so each thread should be sharing memory.
I'd suggest trying n_jobs=1 and check if swap memory is used (you don't have to run it until completion). If this runs fine without swap, we can work further from there.
Sent from my iPhone
@Sebastian: I tried with n_jobs=10 (total is equal to 12) and it still created the same problem. I could try running it by using n_jobs=1 but it would be so slow that it will take ages to complete. The machine has 32GB RAM and it started using Swap memory after consuming full RAM.
Is there a way to tackle or you really think that all this k-fold cross validation, training should be done using Spark's MLib?
Thanks
Regards
Waseem
Thanks for the note, Manoj, didn't know that!
@muhammad So if there's no duplication of data across all processes, I guess that the you would also run into troubles with n_jobs=1. But just to make sure that data duplication is not an issue, could you try running it with n_jobs=1? In this case, probably only a smaller data set or machine with larger memory would help. Here, I'd probably think about using Spark's MLlib to deal with this particular dataset.
Post by muhammad waseem
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this affect the results and time it takes to run cross_validation, grid_search etc?
@Sebastian: Will the Spark implication will also improve the memory use or just the CPU?
Thanks
Kindest Regards
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this affect the results and time it takes to run cross_validation, grid_search etc?
Thanks
Kindest Regards
Waseem
muhammad waseem
2016-02-18 18:45:44 UTC
Permalink
@Sebastian:
Thanks for your reply.
Yes, I just saw the email that new version is out. I will give it a try
tomorrow.

Thanks
Regards
Waseem
Post by Sebastian Raschka
Post by muhammad waseem
@Your code: Is this the full code or some part of it is missing? I can
see
Post by muhammad waseem
...
after
Yes, there is part of it missing -- I removed it for clarity. It's
essentially just a whole bunch of nested for-loops (bad-style anyway, but
that was just a quick work-around). It's basically just iterating over
different parameters sets to do the grid-search "manually."
Btw I just saw that scikit-learn 0.17.1 came out today including an
updated version of joblib. Maybe it's worth a try to see if it may solve
the problem?
Post by muhammad waseem
Post by muhammad waseem
@Your code: Is this the full code or some part of it is missing? I can
see
Post by muhammad waseem
Post by muhammad waseem
...
after
Yes, there is part of it missing -- I removed it for clarity. It's
essentially just a whole bunch of nested for-loops (bad-style anyway, but
that was just a quick work-around). It's basically just iterating over
different parameters sets to do the grid-search "manually."
Post by muhammad waseem
Btw I just saw that scikit-learn 0.17.1 came out today including an
updated version of joblib. Maybe it's worth a try to see if it may solve
the problem?
Post by muhammad waseem
Post by muhammad waseem
@Sebastian: I will add in the discussion, it looks like it is not very
active :(
Post by muhammad waseem
Post by muhammad waseem
@Your code: Is this the full code or some part of it is missing? I can
see
Post by muhammad waseem
Post by muhammad waseem
...
after
for p2 in
which means there is some thing missing there, no?
Thanks
On Wed, Feb 17, 2016 at 7:40 PM, Sebastian Raschka <
@Waseem Oh, wait, I just see that we already have an open issue for
that, please see: https://github.com/scikit-learn/scikit-learn/issues/3973
Would be great if you could add to the discussion there. Meanwhile, I will
try to run my code again in the next few days to check if this bug still
persists.
Post by muhammad waseem
Post by muhammad waseem
Post by Sebastian Raschka
Hm, unfortunately, that's what I thought -- sounds like a bug
involved in joblib? Does someone has any ideas how to track this down?
Post by muhammad waseem
Post by muhammad waseem
Post by Sebastian Raschka
@Waseem Can you also try n_jobs=2? Here, I'd expect that it
1) would use maybe 2 times the 12% plus a little bit extra if
everything is working correctly with the multi-threading.
Post by muhammad waseem
Post by muhammad waseem
Post by Sebastian Raschka
2) If you see something like ~30%, I'd say that there's an
unnecessary copy made
Post by muhammad waseem
Post by muhammad waseem
Post by Sebastian Raschka
3) If you see something like > 30% there would be a memory leak
somewhere
Post by muhammad waseem
Post by muhammad waseem
Post by Sebastian Raschka
I mentioned scenario 3, because I observed a very similar behavior
(see https://github.com/scikit-learn/scikit-learn/issues/3973)
"I made some weird observations that my GridSearches keep failing
after a couple of hours and I initially couldn't figure out why. I
monitored the memory usage then over time and saw that it it started with a
few gigabytes (~6 Gb) and kept increasing until it crashed the node when it
reached the max. 128 Gb the hardware can take. I was experimenting with
random forests for classification of a large number of text documents. For
simplicity -- to figure out what's going on -- I went back to naive Bayes.
Post by muhammad waseem
Post by muhammad waseem
Post by Sebastian Raschka
...
After some experimentation, I finally found out that
gc.collect()
len(gc.get_objects()) # particularly this part!
in the for loop solves the problem and the memory usage stays
constantly at 6.5 Gb over the run time of ~10 hours.
Post by muhammad waseem
Post by muhammad waseem
Post by Sebastian Raschka
On Feb 15, 2016, at 9:37 AM, muhammad waseem <
@Sebastian: I have tried to run cross_validation by using n_jobs=1
and it did not use SWAP memory, even the RAM usage was quite low (maximum
12%). However, this will take a longer time to finish. Any idea what to try
now?
Post by muhammad waseem
Post by muhammad waseem
Post by Sebastian Raschka
Thanks
Kindest Regards
Waseem
On Fri, Feb 12, 2016 at 9:58 PM, Jacob Schreiber <
I don't think that the data is copied for tree based classifiers. It
uses the threading backend, so each thread should be sharing memory.
Post by muhammad waseem
Post by muhammad waseem
Post by Sebastian Raschka
On Fri, Feb 12, 2016 at 12:32 PM, Sebastian Raschka <
I'd suggest trying n_jobs=1 and check if swap memory is used (you
don't have to run it until completion). If this runs fine without swap, we
can work further from there.
Post by muhammad waseem
Post by muhammad waseem
Post by Sebastian Raschka
Sent from my iPhone
On Feb 12, 2016, at 2:57 PM, muhammad waseem <
Post by muhammad waseem
@Sebastian: I tried with n_jobs=10 (total is equal to 12) and it
still created the same problem. I could try running it by using n_jobs=1
but it would be so slow that it will take ages to complete. The machine has
32GB RAM and it started using Swap memory after consuming full RAM.
Post by muhammad waseem
Post by muhammad waseem
Post by Sebastian Raschka
Post by muhammad waseem
Is there a way to tackle or you really think that all this k-fold
cross validation, training should be done using Spark's MLib?
Post by muhammad waseem
Post by muhammad waseem
Post by Sebastian Raschka
Post by muhammad waseem
Thanks
Regards
Waseem
On Fri, Feb 12, 2016 at 6:40 PM, Sebastian Raschka <
Thanks for the note, Manoj, didn't know that!
@muhammad So if there's no duplication of data across all
processes, I guess that the you would also run into troubles with n_jobs=1.
But just to make sure that data duplication is not an issue, could you try
running it with n_jobs=1? In this case, probably only a smaller data set or
machine with larger memory would help. Here, I'd probably think about using
Spark's MLlib to deal with this particular dataset.
Post by muhammad waseem
Post by muhammad waseem
Post by Sebastian Raschka
Post by muhammad waseem
On Feb 12, 2016, at 12:30 PM, muhammad waseem <
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will
this affect the results and time it takes to run cross_validation,
grid_search etc?
Post by muhammad waseem
Post by muhammad waseem
Post by Sebastian Raschka
Post by muhammad waseem
@Sebastian: Will the Spark implication will also improve the
memory use or just the CPU?
Post by muhammad waseem
Post by muhammad waseem
Post by Sebastian Raschka
Post by muhammad waseem
Thanks
Kindest Regards
On Fri, Feb 12, 2016 at 5:29 PM, muhammad waseem <
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will
this affect the results and time it takes to run cross_validation,
grid_search etc?
Post by muhammad waseem
Post by muhammad waseem
Post by Sebastian Raschka
Post by muhammad waseem
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Manoj Kumar
2016-02-12 19:47:45 UTC
Permalink
Hi,

That would depend on the size of the original dataset.

But I think you should try Sebastian's suggestion first to make sure if the
real issue is data duplication or not.
Post by muhammad waseem
Hi Sebastian and Manoj,
@Manoj: What should be the value of max_nbytes parameter and will this
affect the results and time it takes to run cross_validation, grid_search
etc?
Thanks
Kindest Regards
Waseem
Post by Sebastian Raschka
Hi, Waseem,
I think lowering the value of n_jobs would help; as far as I know, each
process get a copy of the data? Just stumbled upon spark-sklearn a few days
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
When I understand correctly, the data is still copied, but here, each
node gets a copy instead of one machine with many copies.
Post by muhammad waseem
Hi,
I am trying to fit my model using regression trees but the problem is,
it consumes a lot of RAM, which makes my code unresponsive. By looking at
different forums and platforms, I think this is a common problem. I was
wondering, how you free up memory or what are the best ways to run the
fitting process/cross-validation without running out of memory? This
problem is mostly with all regression trees (I think with other ML
algorithms as well). Shall I try to run without n_job=-1 and use some other
value (e.g. n_jobs=10) in cross_validation?
Post by muhammad waseem
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140_______________________________________________
Post by muhammad waseem
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Manoj,
http://github.com/MechCoder
Loading...