Discussion:
[Scikit-learn-general] Hyperparameter tuning for Random Forest and Gradient boosting trees
muhammad waseem
2016-01-29 16:45:56 UTC
Permalink
Hello All,
I am new to scikitlearn and ML, and trying to train my model using random
forest and gradient boosting trees regressors. I was wondering what is the
best way to do hyperparameter tuning, shall I use GridSearchCV or
RandomisedSearchCV? I have read that the performance of RandomiseSeacrhCV
is almost same as GridSearchCV (most of the times). If I go with
RandomisedSearchCV then what should be the range of values for different
parameters? How will I know that the range I am selecting is the correct
one?

Also, what about the number of estimators? In the GridSearchCV or
RandomisedSearchCV, shall I start with a low value and then after selecting
other parameters, I will choose a large number of estimators for fitting
purposes. Am I right?

Shall I always use early stopping, no matter if I use Grid search or
Randomised Search?

P.S: Training data: Number of Inputs = 6
Number fo Outputs = 1
Number of samples (rows) = 8526
testing data: Number of samples (rows) = 1416

Thanks
Kindest Regards
Waseem
Sebastian Raschka
2016-01-29 18:57:27 UTC
Permalink
Hi, Waseem,
with a fine-enough grid, the GridSearchCV would be more "thorough" than the randomized search. However, the problem is essentially some sort of combinatorial explosion. Typically, I start with a "rougher" grid (the different parameters are more "spaced out" relative to each other). After that, I use a "finer" grid around the parameters that came up in the previous search.
However, it all comes down to computational time vs. being thorough. Or in other words, grid search is an exhaustive search whereas randomized search is a computationally "more efficient" approach.
Post by muhammad waseem
Hello All,
I am new to scikitlearn and ML, and trying to train my model using random forest and gradient boosting trees regressors. I was wondering what is the best way to do hyperparameter tuning, shall I use GridSearchCV or RandomisedSearchCV? I have read that the performance of RandomiseSeacrhCV is almost same as GridSearchCV (most of the times). If I go with RandomisedSearchCV then what should be the range of values for different parameters? How will I know that the range I am selecting is the correct one?
Also, what about the number of estimators? In the GridSearchCV or RandomisedSearchCV, shall I start with a low value and then after selecting other parameters, I will choose a large number of estimators for fitting purposes. Am I right?
Shall I always use early stopping, no matter if I use Grid search or Randomised Search?
P.S: Training data: Number of Inputs = 6
Number fo Outputs = 1
Number of samples (rows) = 8526
testing data: Number of samples (rows) = 1416
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
muhammad waseem
2016-01-29 21:18:32 UTC
Permalink
Hi Sebastian,
Thanks for your reply. So this mean I should start with e.g. "max_depth":
[1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the max_depth=10 and
min_samples_leaf=10, then I should explore values close to these values. Am
I right?

Shall I use small value of number of estimator, while conducting this
parametric study.After that I can use a higher value while fitting my
model? Will this change other parameters, meaning is n_estimator depends on
other parameters?

Also, should I use early stopping while doing GridSearchCV?

Thanks again.
Regards
Waseem
Post by Sebastian Raschka
Hi, Waseem,
with a fine-enough grid, the GridSearchCV would be more "thorough" than
the randomized search. However, the problem is essentially some sort of
combinatorial explosion. Typically, I start with a "rougher" grid (the
different parameters are more "spaced out" relative to each other). After
that, I use a "finer" grid around the parameters that came up in the
previous search.
However, it all comes down to computational time vs. being thorough. Or in
other words, grid search is an exhaustive search whereas randomized search
is a computationally "more efficient" approach.
Post by muhammad waseem
Hello All,
I am new to scikitlearn and ML, and trying to train my model using
random forest and gradient boosting trees regressors. I was wondering what
is the best way to do hyperparameter tuning, shall I use GridSearchCV or
RandomisedSearchCV? I have read that the performance of RandomiseSeacrhCV
is almost same as GridSearchCV (most of the times). If I go with
RandomisedSearchCV then what should be the range of values for different
parameters? How will I know that the range I am selecting is the correct
one?
Post by muhammad waseem
Also, what about the number of estimators? In the GridSearchCV or
RandomisedSearchCV, shall I start with a low value and then after selecting
other parameters, I will choose a large number of estimators for fitting
purposes. Am I right?
Post by muhammad waseem
Shall I always use early stopping, no matter if I use Grid search or
Randomised Search?
Post by muhammad waseem
P.S: Training data: Number of Inputs = 6
Number fo Outputs = 1
Number of samples (rows) = 8526
testing data: Number of samples (rows) = 1416
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
Post by muhammad waseem
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Sebastian Raschka
2016-01-29 21:33:14 UTC
Permalink
Thanks for your reply. So this mean I should start with e.g. "max_depth": [1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the max_depth=10 and min_samples_leaf=10, then I should explore values close to these values. Am I right?
Yes, this would work. However, keep in mind that you may be missing a "good" combination this way. And if you have a large number of n_estimators, tuning a random forest can be "relatively" expensive. Plus, you'd typically don't want or need to prune the trees here, that's basically the whole idea behind RF.
Shall I use small value of number of estimator, while conducting this parametric study.After that I can use a higher value while fitting my model?
Also here, the parameters that you tuned may only be good for the model based on the specific number of estimators. In general, I would maybe advice against tuning the hyperparameters at all and use the computational time to increase the number of n_estimators.
Hi Sebastian,
Thanks for your reply. So this mean I should start with e.g. "max_depth": [1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the max_depth=10 and min_samples_leaf=10, then I should explore values close to these values. Am I right?
Shall I use small value of number of estimator, while conducting this parametric study.After that I can use a higher value while fitting my model? Will this change other parameters, meaning is n_estimator depends on other parameters?
Also, should I use early stopping while doing GridSearchCV?
Thanks again.
Regards
Waseem
Hi, Waseem,
with a fine-enough grid, the GridSearchCV would be more "thorough" than the randomized search. However, the problem is essentially some sort of combinatorial explosion. Typically, I start with a "rougher" grid (the different parameters are more "spaced out" relative to each other). After that, I use a "finer" grid around the parameters that came up in the previous search.
However, it all comes down to computational time vs. being thorough. Or in other words, grid search is an exhaustive search whereas randomized search is a computationally "more efficient" approach.
Post by muhammad waseem
Hello All,
I am new to scikitlearn and ML, and trying to train my model using random forest and gradient boosting trees regressors. I was wondering what is the best way to do hyperparameter tuning, shall I use GridSearchCV or RandomisedSearchCV? I have read that the performance of RandomiseSeacrhCV is almost same as GridSearchCV (most of the times). If I go with RandomisedSearchCV then what should be the range of values for different parameters? How will I know that the range I am selecting is the correct one?
Also, what about the number of estimators? In the GridSearchCV or RandomisedSearchCV, shall I start with a low value and then after selecting other parameters, I will choose a large number of estimators for fitting purposes. Am I right?
Shall I always use early stopping, no matter if I use Grid search or Randomised Search?
P.S: Training data: Number of Inputs = 6
Number fo Outputs = 1
Number of samples (rows) = 8526
testing data: Number of samples (rows) = 1416
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________ <http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________>
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 <http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140>
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
muhammad waseem
2016-01-29 21:38:57 UTC
Permalink
Post by muhammad waseem
[1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the max_depth=10 and
min_samples_leaf=10, then I should explore values close to these values. Am
I right?
Yes, this would work. However, keep in mind that you may be missing a
"good" combination this way. And if you have a large number of
n_estimators, tuning a random forest can be "relatively" expensive. Plus,
you'd typically don't want or need to prune the trees here, that's
basically the whole idea behind RF.
So I make sure that I don't miss the "Good" combination?
Post by muhammad waseem
Shall I use small value of number of estimator, while conducting this
parametric study.After that I can use a higher value while fitting my model?
Also here, the parameters that you tuned may only be good for the model
based on the specific number of estimators. In general, I would maybe
advice against tuning the hyperparameters at all and use the computational
time to increase the number of n_estimators.
Maybe considering computational time and then making sure that I have
enough number of estimators in the parametric study?
Post by muhammad waseem
Hi Sebastian,
[1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the max_depth=10 and
min_samples_leaf=10, then I should explore values close to these values. Am
I right?
Shall I use small value of number of estimator, while conducting this
parametric study.After that I can use a higher value while fitting my
model? Will this change other parameters, meaning is n_estimator depends on
other parameters?
Also, should I use early stopping while doing GridSearchCV?
Thanks again.
Regards
Waseem
Post by Sebastian Raschka
Hi, Waseem,
with a fine-enough grid, the GridSearchCV would be more "thorough" than
the randomized search. However, the problem is essentially some sort of
combinatorial explosion. Typically, I start with a "rougher" grid (the
different parameters are more "spaced out" relative to each other). After
that, I use a "finer" grid around the parameters that came up in the
previous search.
However, it all comes down to computational time vs. being thorough. Or
in other words, grid search is an exhaustive search whereas randomized
search is a computationally "more efficient" approach.
Post by muhammad waseem
Hello All,
I am new to scikitlearn and ML, and trying to train my model using
random forest and gradient boosting trees regressors. I was wondering what
is the best way to do hyperparameter tuning, shall I use GridSearchCV or
RandomisedSearchCV? I have read that the performance of RandomiseSeacrhCV
is almost same as GridSearchCV (most of the times). If I go with
RandomisedSearchCV then what should be the range of values for different
parameters? How will I know that the range I am selecting is the correct
one?
Post by muhammad waseem
Also, what about the number of estimators? In the GridSearchCV or
RandomisedSearchCV, shall I start with a low value and then after selecting
other parameters, I will choose a large number of estimators for fitting
purposes. Am I right?
Post by muhammad waseem
Shall I always use early stopping, no matter if I use Grid search or
Randomised Search?
Post by muhammad waseem
P.S: Training data: Number of Inputs = 6
Number fo Outputs = 1
Number of samples (rows) = 8526
testing data: Number of samples (rows) = 1416
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Post by muhammad waseem
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
Post by muhammad waseem
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Dr Muhammad Waseem Ahmad
Research Associate,
BRE Center for Sustainable Construction,

School of Engineering,

Cardiff University,

Cardiff, UK.
muhammad waseem
2016-01-29 21:45:09 UTC
Permalink
I meant, how I make sure that I don't miss the "Good" combination that you
mentioned?

Also, for second point: Maybe considering computational time and then
making sure that I have enough number of estimators in the parametric
study?
Post by muhammad waseem
Post by muhammad waseem
[1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the max_depth=10 and
min_samples_leaf=10, then I should explore values close to these values. Am
I right?
Yes, this would work. However, keep in mind that you may be missing a
"good" combination this way. And if you have a large number of
n_estimators, tuning a random forest can be "relatively" expensive. Plus,
you'd typically don't want or need to prune the trees here, that's
basically the whole idea behind RF.
So I make sure that I don't miss the "Good" combination?
Post by muhammad waseem
Shall I use small value of number of estimator, while conducting this
parametric study.After that I can use a higher value while fitting my model?
Also here, the parameters that you tuned may only be good for the model
based on the specific number of estimators. In general, I would maybe
advice against tuning the hyperparameters at all and use the computational
time to increase the number of n_estimators.
Maybe considering computational time and then making sure that I have
enough number of estimators in the parametric study?
Post by muhammad waseem
Hi Sebastian,
[1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the max_depth=10 and
min_samples_leaf=10, then I should explore values close to these values. Am
I right?
Shall I use small value of number of estimator, while conducting this
parametric study.After that I can use a higher value while fitting my
model? Will this change other parameters, meaning is n_estimator depends on
other parameters?
Also, should I use early stopping while doing GridSearchCV?
Thanks again.
Regards
Waseem
Post by Sebastian Raschka
Hi, Waseem,
with a fine-enough grid, the GridSearchCV would be more "thorough" than
the randomized search. However, the problem is essentially some sort of
combinatorial explosion. Typically, I start with a "rougher" grid (the
different parameters are more "spaced out" relative to each other). After
that, I use a "finer" grid around the parameters that came up in the
previous search.
However, it all comes down to computational time vs. being thorough. Or
in other words, grid search is an exhaustive search whereas randomized
search is a computationally "more efficient" approach.
On Jan 29, 2016, at 11:45 AM, muhammad waseem <
Hello All,
I am new to scikitlearn and ML, and trying to train my model using
random forest and gradient boosting trees regressors. I was wondering what
is the best way to do hyperparameter tuning, shall I use GridSearchCV or
RandomisedSearchCV? I have read that the performance of RandomiseSeacrhCV
is almost same as GridSearchCV (most of the times). If I go with
RandomisedSearchCV then what should be the range of values for different
parameters? How will I know that the range I am selecting is the correct
one?
Also, what about the number of estimators? In the GridSearchCV or
RandomisedSearchCV, shall I start with a low value and then after selecting
other parameters, I will choose a large number of estimators for fitting
purposes. Am I right?
Shall I always use early stopping, no matter if I use Grid search or
Randomised Search?
P.S: Training data: Number of Inputs = 6
Number fo Outputs = 1
Number of samples (rows) = 8526
testing data: Number of samples (rows) = 1416
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Dr Muhammad Waseem Ahmad
Research Associate,
BRE Center for Sustainable Construction,
School of Engineering,
Cardiff University,
Cardiff, UK.
--
Dr Muhammad Waseem Ahmad
Research Associate,
BRE Center for Sustainable Construction,

School of Engineering,

Cardiff University,

Cardiff, UK.
Sebastian Raschka
2016-01-29 21:51:51 UTC
Permalink
I meant, how I make sure that I don't miss the "Good" combination that you mentioned?
Here, we are back to an exhaustive search on an infinitely small grid :). It's really about finding the "sweet spot" that is "practical" given your problem and available resources.
Also, for second point: Maybe considering computational time and then making sure that I have enough number of estimators in the parametric study?
What do you mean by parametric study, exactly? Do you mean that you are doing the hyperparam search for an empirical comparison study or do you just want to get a good model?
Thanks for your reply. So this mean I should start with e.g. "max_depth": [1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the max_depth=10 and min_samples_leaf=10, then I should explore values close to these values. Am I right?
Yes, this would work. However, keep in mind that you may be missing a "good" combination this way. And if you have a large number of n_estimators, tuning a random forest can be "relatively" expensive. Plus, you'd typically don't want or need to prune the trees here, that's basically the whole idea behind RF.
So I make sure that I don't miss the "Good" combination?
Shall I use small value of number of estimator, while conducting this parametric study.After that I can use a higher value while fitting my model?
Also here, the parameters that you tuned may only be good for the model based on the specific number of estimators. In general, I would maybe advice against tuning the hyperparameters at all and use the computational time to increase the number of n_estimators.
Maybe considering computational time and then making sure that I have enough number of estimators in the parametric study?
Hi Sebastian,
Thanks for your reply. So this mean I should start with e.g. "max_depth": [1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the max_depth=10 and min_samples_leaf=10, then I should explore values close to these values. Am I right?
Shall I use small value of number of estimator, while conducting this parametric study.After that I can use a higher value while fitting my model? Will this change other parameters, meaning is n_estimator depends on other parameters?
Also, should I use early stopping while doing GridSearchCV?
Thanks again.
Regards
Waseem
Hi, Waseem,
with a fine-enough grid, the GridSearchCV would be more "thorough" than the randomized search. However, the problem is essentially some sort of combinatorial explosion. Typically, I start with a "rougher" grid (the different parameters are more "spaced out" relative to each other). After that, I use a "finer" grid around the parameters that came up in the previous search.
However, it all comes down to computational time vs. being thorough. Or in other words, grid search is an exhaustive search whereas randomized search is a computationally "more efficient" approach.
Post by muhammad waseem
Hello All,
I am new to scikitlearn and ML, and trying to train my model using random forest and gradient boosting trees regressors. I was wondering what is the best way to do hyperparameter tuning, shall I use GridSearchCV or RandomisedSearchCV? I have read that the performance of RandomiseSeacrhCV is almost same as GridSearchCV (most of the times). If I go with RandomisedSearchCV then what should be the range of values for different parameters? How will I know that the range I am selecting is the correct one?
Also, what about the number of estimators? In the GridSearchCV or RandomisedSearchCV, shall I start with a low value and then after selecting other parameters, I will choose a large number of estimators for fitting purposes. Am I right?
Shall I always use early stopping, no matter if I use Grid search or Randomised Search?
P.S: Training data: Number of Inputs = 6
Number fo Outputs = 1
Number of samples (rows) = 8526
testing data: Number of samples (rows) = 1416
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________ <http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________>
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 <http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140>
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________ <http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________>
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140 <http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140>
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
--
Dr Muhammad Waseem Ahmad
Research Associate,
BRE Center for Sustainable Construction,
School of Engineering,
Cardiff University,
Cardiff, UK.
--
Dr Muhammad Waseem Ahmad
Research Associate,
BRE Center for Sustainable Construction,
School of Engineering,
Cardiff University,
Cardiff, UK.
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
muhammad waseem
2016-01-29 21:58:20 UTC
Permalink
I meant, how I make sure that I don't miss the "Good" combination that you mentioned?
Here, we are back to an exhaustive search on an infinitely small grid :).
It's really about finding the "sweet spot" that is "practical" given your
problem and available resources.
Also, for second point: Maybe considering computational time and then
making sure that I have enough number of estimators in the parametric
study?
What do you mean by parametric study, exactly? Do you mean that you are
doing the hyperparam search for an empirical comparison study or do you
just want to get a good model?
Well, both could be addressed, no? But first focus is to get the model
right with right selected parameters.
Post by muhammad waseem
Post by muhammad waseem
[1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the max_depth=10 and
min_samples_leaf=10, then I should explore values close to these values. Am
I right?
Yes, this would work. However, keep in mind that you may be missing a
"good" combination this way. And if you have a large number of
n_estimators, tuning a random forest can be "relatively" expensive. Plus,
you'd typically don't want or need to prune the trees here, that's
basically the whole idea behind RF.
So I make sure that I don't miss the "Good" combination?
Post by muhammad waseem
Shall I use small value of number of estimator, while conducting this
parametric study.After that I can use a higher value while fitting my model?
Also here, the parameters that you tuned may only be good for the model
based on the specific number of estimators. In general, I would maybe
advice against tuning the hyperparameters at all and use the computational
time to increase the number of n_estimators.
Maybe considering computational time and then making sure that I have
enough number of estimators in the parametric study?
Post by muhammad waseem
Hi Sebastian,
Thanks for your reply. So this mean I should start with e.g.
"max_depth": [1,4,10,15], "min_samples_leaf":[1,10,20,30]. and if the
max_depth=10 and min_samples_leaf=10, then I should explore values close to
these values. Am I right?
Shall I use small value of number of estimator, while conducting this
parametric study.After that I can use a higher value while fitting my
model? Will this change other parameters, meaning is n_estimator depends on
other parameters?
Also, should I use early stopping while doing GridSearchCV?
Thanks again.
Regards
Waseem
Post by Sebastian Raschka
Hi, Waseem,
with a fine-enough grid, the GridSearchCV would be more "thorough" than
the randomized search. However, the problem is essentially some sort of
combinatorial explosion. Typically, I start with a "rougher" grid (the
different parameters are more "spaced out" relative to each other). After
that, I use a "finer" grid around the parameters that came up in the
previous search.
However, it all comes down to computational time vs. being thorough. Or
in other words, grid search is an exhaustive search whereas randomized
search is a computationally "more efficient" approach.
On Jan 29, 2016, at 11:45 AM, muhammad waseem <
Hello All,
I am new to scikitlearn and ML, and trying to train my model using
random forest and gradient boosting trees regressors. I was wondering what
is the best way to do hyperparameter tuning, shall I use GridSearchCV or
RandomisedSearchCV? I have read that the performance of RandomiseSeacrhCV
is almost same as GridSearchCV (most of the times). If I go with
RandomisedSearchCV then what should be the range of values for different
parameters? How will I know that the range I am selecting is the correct
one?
Also, what about the number of estimators? In the GridSearchCV or
RandomisedSearchCV, shall I start with a low value and then after selecting
other parameters, I will choose a large number of estimators for fitting
purposes. Am I right?
Shall I always use early stopping, no matter if I use Grid search or
Randomised Search?
P.S: Training data: Number of Inputs = 6
Number fo Outputs = 1
Number of samples (rows) = 8526
testing data: Number of samples (rows) = 1416
Thanks
Kindest Regards
Waseem
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=267308311&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Loading...