Tim Hunter
2016-02-10 18:14:15 UTC
Hello community,
I would like to introduce a new package that should be of interest to
scikit-learn users who work with the Spark framework, or with a
distributed system.
It provides the following, among other tools:
- train and evaluate multiple scikit-learn models in parallel.
- convert Spark's Dataframes seamlessly into numpy arrays
- (experimental) distribute Scipy's sparse matrices as a dataset of
sparse vectors.
Spark-sklearn focuses on problems that have a small amount of data and
that can be run in parallel. Note this package distributes simple
tasks like grid-search cross-validation. It does not distribute
individual learning algorithms (unlike Spark MLlib).
If you want to use it, see instructions on the package page:
https://github.com/databricks/spark-sklearn
This blog post contains more details:
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
Let us know if you have any questions. Also, documentation or code
contributions are much welcome (Apache 2.0 license).
Cheers
Tim and Joseph
I would like to introduce a new package that should be of interest to
scikit-learn users who work with the Spark framework, or with a
distributed system.
It provides the following, among other tools:
- train and evaluate multiple scikit-learn models in parallel.
- convert Spark's Dataframes seamlessly into numpy arrays
- (experimental) distribute Scipy's sparse matrices as a dataset of
sparse vectors.
Spark-sklearn focuses on problems that have a small amount of data and
that can be run in parallel. Note this package distributes simple
tasks like grid-search cross-validation. It does not distribute
individual learning algorithms (unlike Spark MLlib).
If you want to use it, see instructions on the package page:
https://github.com/databricks/spark-sklearn
This blog post contains more details:
https://databricks.com/blog/2016/02/08/auto-scaling-scikit-learn-with-spark.html
Let us know if you have any questions. Also, documentation or code
contributions are much welcome (Apache 2.0 license).
Cheers
Tim and Joseph