Gael Varoquaux
2011-02-27 23:24:45 UTC
Hi,
I was looking at huge parallel for loops ran with joblib.Parallel (to be
precise, in the scikits.learn's GridSearchCV) and I realized that as joblib
was dispatching immediatly to sub-processes, it could create huge
temporaries. Thus I refactored the Parallel engine, to enable late
... print 'Produced %s' % i
... yield i
Produced 0
Produced 1
Produced 2
[Parallel(n_jobs=2)]: Done 1 out of 3+ |elapsed: 0.0s remaining: 0.0s
Produced 3
[Parallel(n_jobs=2)]: Done 2 out of 4+ |elapsed: 0.0s remaining: 0.0s
Produced 4
[Parallel(n_jobs=2)]: Done 3 out of 5+ |elapsed: 0.0s remaining: 0.0s
...
I am planning to release in a few days joblib 0.5.0 with this feature.
The release will also contain small improvements that make joblib's
caching engine more robust when used with many processes.
The soon-to-be-released code can be found in the 0.5.X branch.
I am planning to use this is the near future to improve parallelism in
the scikits.learn's GridSearchCV.
Any feedback is more than welcome.
Gael
I was looking at huge parallel for loops ran with joblib.Parallel (to be
precise, in the scikits.learn's GridSearchCV) and I realized that as joblib
was dispatching immediatly to sub-processes, it could create huge
temporaries. Thus I refactored the Parallel engine, to enable late
from math import sqrt
from joblib import Parallel, delayed
... for i in range(6):from joblib import Parallel, delayed
... print 'Produced %s' % i
... yield i
out = Parallel(n_jobs=2, verbose=1, pre_dispatch='1.5*n_jobs')(
... delayed(sqrt)(i) for i in producer())Produced 0
Produced 1
Produced 2
[Parallel(n_jobs=2)]: Done 1 out of 3+ |elapsed: 0.0s remaining: 0.0s
Produced 3
[Parallel(n_jobs=2)]: Done 2 out of 4+ |elapsed: 0.0s remaining: 0.0s
Produced 4
[Parallel(n_jobs=2)]: Done 3 out of 5+ |elapsed: 0.0s remaining: 0.0s
...
I am planning to release in a few days joblib 0.5.0 with this feature.
The release will also contain small improvements that make joblib's
caching engine more robust when used with many processes.
The soon-to-be-released code can be found in the 0.5.X branch.
I am planning to use this is the near future to improve parallelism in
the scikits.learn's GridSearchCV.
Any feedback is more than welcome.
Gael