Discussion:
[Scikit-learn-general] Speed up Random Forest/ Extra Trees tuning
Lam Dang
2016-03-21 20:24:29 UTC
Permalink
Hello scikit-learners,

Here is an idea to accelerate to accelerate parameters tuning for Random
Forest and Extra Trees. I am very interested if anyone know if the idea is
exploited somewhere or whether it makes sense.

Let's say we have a data set with train and validation (cross-validation
also works).

The process today of tuning Random Forest is to try different set of
parameters, check validation performance, reiterate and take the model with
best validation score in the end.

The idea to improve this process is:
- Fit the model once while growing all the trees to maximum, save this
model as a baseline
- For any set of parameters, the new model can be produced by reducing the
trees in the baseline model based on parameters. For example, for
max_depth=5, one can just remove all the nodes with depth higher than 5.
This process should be much faster than regrowing trees since it doen't
need to refit the model
- Use validation (or cross-validaiton) performance to choose best model as
usual.

This works (theoretically) because:
- For any parameters, the fitted trees will be just a part of the baseline
trees grown as maximum (except for criterion but it probably matters less)
- Trees are grown independant to each other (so this idea will not work for
GBM)

That's it. I am very interested in any feedback, whether it makes sense, of
it was done somewhere else already or whether it will work.

Best regards,
Lam Dang
Jacob Schreiber
2016-03-21 20:42:46 UTC
Permalink
Hi Lam

The idea of exploiting redundancies to speed up algorithms is a good
intuition. However, I don't think that most attributes would be able to be
done in this manner. For example, considering different numbers of max
features in the splits would be difficult to calculate without storing all
possible splits at each node and just reducing the set of considered ones.
And since all splits depend on the split before them, it may be difficult
to modify splits in the middle of the tree without simply regrowing them
(such as changing the feature the tree was split on.)

Jacob
Post by Lam Dang
Hello scikit-learners,
Here is an idea to accelerate to accelerate parameters tuning for Random
Forest and Extra Trees. I am very interested if anyone know if the idea is
exploited somewhere or whether it makes sense.
Let's say we have a data set with train and validation (cross-validation
also works).
The process today of tuning Random Forest is to try different set of
parameters, check validation performance, reiterate and take the model with
best validation score in the end.
- Fit the model once while growing all the trees to maximum, save this
model as a baseline
- For any set of parameters, the new model can be produced by reducing the
trees in the baseline model based on parameters. For example, for
max_depth=5, one can just remove all the nodes with depth higher than 5.
This process should be much faster than regrowing trees since it doen't
need to refit the model
- Use validation (or cross-validaiton) performance to choose best model as
usual.
- For any parameters, the fitted trees will be just a part of the baseline
trees grown as maximum (except for criterion but it probably matters less)
- Trees are grown independant to each other (so this idea will not work
for GBM)
That's it. I am very interested in any feedback, whether it makes sense,
of it was done somewhere else already or whether it will work.
Best regards,
Lam Dang
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Lam Dang
2016-03-21 21:19:25 UTC
Permalink
Hi Jacob,

Thanks for your answer. Indeed you are right, some parameters cannot be
adjusted off-data. Let's go through the parameters list to see which one
can be adjusted:
n_estimators : this is simple - the more the better
criterion : No
max_features : No
max_depth : Yes
min_samples_split : Yes
min_samples_leaf : Yes
min_weight_fraction_leaf : Yes
max_leaf_nodes Yes
bootstrap : No

So basically the bootstrap-related parameters cannot be adjusted, while
tree parameters can. It should still speed up the search, right?
Best,
Lam
Post by Jacob Schreiber
Hi Lam
The idea of exploiting redundancies to speed up algorithms is a good
intuition. However, I don't think that most attributes would be able to be
done in this manner. For example, considering different numbers of max
features in the splits would be difficult to calculate without storing all
possible splits at each node and just reducing the set of considered ones.
And since all splits depend on the split before them, it may be difficult
to modify splits in the middle of the tree without simply regrowing them
(such as changing the feature the tree was split on.)
Jacob
Post by Lam Dang
Hello scikit-learners,
Here is an idea to accelerate to accelerate parameters tuning for Random
Forest and Extra Trees. I am very interested if anyone know if the idea is
exploited somewhere or whether it makes sense.
Let's say we have a data set with train and validation (cross-validation
also works).
The process today of tuning Random Forest is to try different set of
parameters, check validation performance, reiterate and take the model with
best validation score in the end.
- Fit the model once while growing all the trees to maximum, save this
model as a baseline
- For any set of parameters, the new model can be produced by reducing
the trees in the baseline model based on parameters. For example, for
max_depth=5, one can just remove all the nodes with depth higher than 5.
This process should be much faster than regrowing trees since it doen't
need to refit the model
- Use validation (or cross-validaiton) performance to choose best model
as usual.
- For any parameters, the fitted trees will be just a part of the
baseline trees grown as maximum (except for criterion but it probably
matters less)
- Trees are grown independant to each other (so this idea will not work
for GBM)
That's it. I am very interested in any feedback, whether it makes sense,
of it was done somewhere else already or whether it will work.
Best regards,
Lam Dang
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Jacob Schreiber
2016-03-21 21:32:22 UTC
Permalink
It should if you're using those parameters. It's basically similar to
calculating the regularization path for LASSO, since these are also
regularization terms. I think this would probably be a good addition if
there was a clean implementation for it.
Post by Lam Dang
Hi Jacob,
Thanks for your answer. Indeed you are right, some parameters cannot be
adjusted off-data. Let's go through the parameters list to see which one
n_estimators : this is simple - the more the better
criterion : No
max_features : No
max_depth : Yes
min_samples_split : Yes
min_samples_leaf : Yes
min_weight_fraction_leaf : Yes
max_leaf_nodes Yes
bootstrap : No
So basically the bootstrap-related parameters cannot be adjusted, while
tree parameters can. It should still speed up the search, right?
Best,
Lam
Post by Jacob Schreiber
Hi Lam
The idea of exploiting redundancies to speed up algorithms is a good
intuition. However, I don't think that most attributes would be able to be
done in this manner. For example, considering different numbers of max
features in the splits would be difficult to calculate without storing all
possible splits at each node and just reducing the set of considered ones.
And since all splits depend on the split before them, it may be difficult
to modify splits in the middle of the tree without simply regrowing them
(such as changing the feature the tree was split on.)
Jacob
Post by Lam Dang
Hello scikit-learners,
Here is an idea to accelerate to accelerate parameters tuning for Random
Forest and Extra Trees. I am very interested if anyone know if the idea is
exploited somewhere or whether it makes sense.
Let's say we have a data set with train and validation (cross-validation
also works).
The process today of tuning Random Forest is to try different set of
parameters, check validation performance, reiterate and take the model with
best validation score in the end.
- Fit the model once while growing all the trees to maximum, save this
model as a baseline
- For any set of parameters, the new model can be produced by reducing
the trees in the baseline model based on parameters. For example, for
max_depth=5, one can just remove all the nodes with depth higher than 5.
This process should be much faster than regrowing trees since it
doen't need to refit the model
- Use validation (or cross-validaiton) performance to choose best model
as usual.
- For any parameters, the fitted trees will be just a part of the
baseline trees grown as maximum (except for criterion but it probably
matters less)
- Trees are grown independant to each other (so this idea will not work
for GBM)
That's it. I am very interested in any feedback, whether it makes sense,
of it was done somewhere else already or whether it will work.
Best regards,
Lam Dang
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Mathieu Blondel
2016-03-22 00:44:35 UTC
Permalink
Related issue:
https://github.com/scikit-learn/scikit-learn/issues/3652
Post by Jacob Schreiber
It should if you're using those parameters. It's basically similar to
calculating the regularization path for LASSO, since these are also
regularization terms. I think this would probably be a good addition if
there was a clean implementation for it.
Post by Lam Dang
Hi Jacob,
Thanks for your answer. Indeed you are right, some parameters cannot be
adjusted off-data. Let's go through the parameters list to see which one
n_estimators : this is simple - the more the better
criterion : No
max_features : No
max_depth : Yes
min_samples_split : Yes
min_samples_leaf : Yes
min_weight_fraction_leaf : Yes
max_leaf_nodes Yes
bootstrap : No
So basically the bootstrap-related parameters cannot be adjusted, while
tree parameters can. It should still speed up the search, right?
Best,
Lam
Post by Jacob Schreiber
Hi Lam
The idea of exploiting redundancies to speed up algorithms is a good
intuition. However, I don't think that most attributes would be able to be
done in this manner. For example, considering different numbers of max
features in the splits would be difficult to calculate without storing all
possible splits at each node and just reducing the set of considered ones.
And since all splits depend on the split before them, it may be difficult
to modify splits in the middle of the tree without simply regrowing them
(such as changing the feature the tree was split on.)
Jacob
Post by Lam Dang
Hello scikit-learners,
Here is an idea to accelerate to accelerate parameters tuning for
Random Forest and Extra Trees. I am very interested if anyone know if the
idea is exploited somewhere or whether it makes sense.
Let's say we have a data set with train and validation
(cross-validation also works).
The process today of tuning Random Forest is to try different set of
parameters, check validation performance, reiterate and take the model with
best validation score in the end.
- Fit the model once while growing all the trees to maximum, save this
model as a baseline
- For any set of parameters, the new model can be produced by reducing
the trees in the baseline model based on parameters. For example, for
max_depth=5, one can just remove all the nodes with depth higher than 5.
This process should be much faster than regrowing trees since it
doen't need to refit the model
- Use validation (or cross-validaiton) performance to choose best model
as usual.
- For any parameters, the fitted trees will be just a part of the
baseline trees grown as maximum (except for criterion but it probably
matters less)
- Trees are grown independant to each other (so this idea will not work
for GBM)
That's it. I am very interested in any feedback, whether it makes
sense, of it was done somewhere else already or whether it will work.
Best regards,
Lam Dang
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Gilles Louppe
2016-03-22 07:27:11 UTC
Permalink
Unfortunately, the most important parameters to adjust to maximize
accuracy are often those controlling the randomness in the algorithm,
i.e. max_features for which this strategy is not possible.

That being said, in the case of boosting, I think this strategy would
be worth automatizing, e.g. to adjust the number of trees.

Gilles
Post by Mathieu Blondel
https://github.com/scikit-learn/scikit-learn/issues/3652
Post by Jacob Schreiber
It should if you're using those parameters. It's basically similar to
calculating the regularization path for LASSO, since these are also
regularization terms. I think this would probably be a good addition if
there was a clean implementation for it.
Post by Lam Dang
Hi Jacob,
Thanks for your answer. Indeed you are right, some parameters cannot be
adjusted off-data. Let's go through the parameters list to see which one can
n_estimators : this is simple - the more the better
criterion : No
max_features : No
max_depth : Yes
min_samples_split : Yes
min_samples_leaf : Yes
min_weight_fraction_leaf : Yes
max_leaf_nodes Yes
bootstrap : No
So basically the bootstrap-related parameters cannot be adjusted, while
tree parameters can. It should still speed up the search, right?
Best,
Lam
Post by Jacob Schreiber
Hi Lam
The idea of exploiting redundancies to speed up algorithms is a good
intuition. However, I don't think that most attributes would be able to be
done in this manner. For example, considering different numbers of max
features in the splits would be difficult to calculate without storing all
possible splits at each node and just reducing the set of considered ones.
And since all splits depend on the split before them, it may be difficult to
modify splits in the middle of the tree without simply regrowing them (such
as changing the feature the tree was split on.)
Jacob
Post by Lam Dang
Hello scikit-learners,
Here is an idea to accelerate to accelerate parameters tuning for
Random Forest and Extra Trees. I am very interested if anyone know if the
idea is exploited somewhere or whether it makes sense.
Let's say we have a data set with train and validation
(cross-validation also works).
The process today of tuning Random Forest is to try different set of
parameters, check validation performance, reiterate and take the model with
best validation score in the end.
- Fit the model once while growing all the trees to maximum, save this
model as a baseline
- For any set of parameters, the new model can be produced by reducing
the trees in the baseline model based on parameters. For example, for
max_depth=5, one can just remove all the nodes with depth higher than 5.
This process should be much faster than regrowing trees since it
doen't need to refit the model
- Use validation (or cross-validaiton) performance to choose best model
as usual.
- For any parameters, the fitted trees will be just a part of the
baseline trees grown as maximum (except for criterion but it probably
matters less)
- Trees are grown independant to each other (so this idea will not work
for GBM)
That's it. I am very interested in any feedback, whether it makes
sense, of it was done somewhere else already or whether it will work.
Best regards,
Lam Dang
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Lam Dang
2016-03-22 11:41:10 UTC
Permalink
Interesting,

Yes max_features is probably the most important parameter. However those
other parameters may have big contribution to reduce overfitting too.
I would probably make some test but I am not experienced with the low level
API of scikit-learn.

Any experimented scikit-learn contributors want to collaborate?
Post by Gilles Louppe
Unfortunately, the most important parameters to adjust to maximize
accuracy are often those controlling the randomness in the algorithm,
i.e. max_features for which this strategy is not possible.
That being said, in the case of boosting, I think this strategy would
be worth automatizing, e.g. to adjust the number of trees.
Gilles
Post by Mathieu Blondel
https://github.com/scikit-learn/scikit-learn/issues/3652
On Tue, Mar 22, 2016 at 6:32 AM, Jacob Schreiber <
Post by Jacob Schreiber
It should if you're using those parameters. It's basically similar to
calculating the regularization path for LASSO, since these are also
regularization terms. I think this would probably be a good addition if
there was a clean implementation for it.
Post by Lam Dang
Hi Jacob,
Thanks for your answer. Indeed you are right, some parameters cannot be
adjusted off-data. Let's go through the parameters list to see which
one can
Post by Mathieu Blondel
Post by Jacob Schreiber
Post by Lam Dang
n_estimators : this is simple - the more the better
criterion : No
max_features : No
max_depth : Yes
min_samples_split : Yes
min_samples_leaf : Yes
min_weight_fraction_leaf : Yes
max_leaf_nodes Yes
bootstrap : No
So basically the bootstrap-related parameters cannot be adjusted, while
tree parameters can. It should still speed up the search, right?
Best,
Lam
Post by Jacob Schreiber
Hi Lam
The idea of exploiting redundancies to speed up algorithms is a good
intuition. However, I don't think that most attributes would be able
to be
Post by Mathieu Blondel
Post by Jacob Schreiber
Post by Lam Dang
Post by Jacob Schreiber
done in this manner. For example, considering different numbers of max
features in the splits would be difficult to calculate without
storing all
Post by Mathieu Blondel
Post by Jacob Schreiber
Post by Lam Dang
Post by Jacob Schreiber
possible splits at each node and just reducing the set of considered
ones.
Post by Mathieu Blondel
Post by Jacob Schreiber
Post by Lam Dang
Post by Jacob Schreiber
And since all splits depend on the split before them, it may be
difficult to
Post by Mathieu Blondel
Post by Jacob Schreiber
Post by Lam Dang
Post by Jacob Schreiber
modify splits in the middle of the tree without simply regrowing them
(such
Post by Mathieu Blondel
Post by Jacob Schreiber
Post by Lam Dang
Post by Jacob Schreiber
as changing the feature the tree was split on.)
Jacob
Post by Lam Dang
Hello scikit-learners,
Here is an idea to accelerate to accelerate parameters tuning for
Random Forest and Extra Trees. I am very interested if anyone know
if the
Post by Mathieu Blondel
Post by Jacob Schreiber
Post by Lam Dang
Post by Jacob Schreiber
Post by Lam Dang
idea is exploited somewhere or whether it makes sense.
Let's say we have a data set with train and validation
(cross-validation also works).
The process today of tuning Random Forest is to try different set of
parameters, check validation performance, reiterate and take the
model with
Post by Mathieu Blondel
Post by Jacob Schreiber
Post by Lam Dang
Post by Jacob Schreiber
Post by Lam Dang
best validation score in the end.
- Fit the model once while growing all the trees to maximum, save
this
Post by Mathieu Blondel
Post by Jacob Schreiber
Post by Lam Dang
Post by Jacob Schreiber
Post by Lam Dang
model as a baseline
- For any set of parameters, the new model can be produced by
reducing
Post by Mathieu Blondel
Post by Jacob Schreiber
Post by Lam Dang
Post by Jacob Schreiber
Post by Lam Dang
the trees in the baseline model based on parameters. For example, for
max_depth=5, one can just remove all the nodes with depth higher
than 5.
Post by Mathieu Blondel
Post by Jacob Schreiber
Post by Lam Dang
Post by Jacob Schreiber
Post by Lam Dang
This process should be much faster than regrowing trees since it
doen't need to refit the model
- Use validation (or cross-validaiton) performance to choose best
model
Post by Mathieu Blondel
Post by Jacob Schreiber
Post by Lam Dang
Post by Jacob Schreiber
Post by Lam Dang
as usual.
- For any parameters, the fitted trees will be just a part of the
baseline trees grown as maximum (except for criterion but it probably
matters less)
- Trees are grown independant to each other (so this idea will not
work
Post by Mathieu Blondel
Post by Jacob Schreiber
Post by Lam Dang
Post by Jacob Schreiber
Post by Lam Dang
for GBM)
That's it. I am very interested in any feedback, whether it makes
sense, of it was done somewhere else already or whether it will work.
Best regards,
Lam Dang
------------------------------------------------------------------------------
Post by Mathieu Blondel
Post by Jacob Schreiber
Post by Lam Dang
Post by Jacob Schreiber
Post by Lam Dang
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Mathieu Blondel
Post by Jacob Schreiber
Post by Lam Dang
Post by Jacob Schreiber
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Mathieu Blondel
Post by Jacob Schreiber
Post by Lam Dang
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Mathieu Blondel
Post by Jacob Schreiber
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Mathieu Blondel
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2016-03-25 16:00:07 UTC
Permalink
Post by Gilles Louppe
Unfortunately, the most important parameters to adjust to maximize
accuracy are often those controlling the randomness in the algorithm,
i.e. max_features for which this strategy is not possible.
That being said, in the case of boosting, I think this strategy would
be worth automatizing, e.g. to adjust the number of trees.
https://github.com/scikit-learn/scikit-learn/pull/5689

Loading...