Discussion:
nested cross validation to get unbiased results
(too old to reply)
Amita Misra
2016-05-12 16:05:53 UTC
Permalink
Hi,

I have a limited dataset and hence want to learn the parameters and also
evaluate the final model.
From the documents it looks that nested cross validation is the way to do
it. I have the code but still I want to be sure that I am not overfitting
any way.

pipeline=Pipeline([('scale', preprocessing.StandardScaler()),('filter',
SelectKBest(f_regression)),('svr', svm.SVR())]
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
param_grid=[{'svr__kernel': ['rbf'], 'svr__gamma': gamma_range,'svr__C':
C_range}]
grid_search = GridSearchCV(pipeline, param_grid=param_grid,cv=5)
Y_pred=cross_validation.cross_val_predict(grid_search, X_train,
Y_train,cv=10)

correlation= numpy.ma.corrcoef(Y_train,Y_pred)[0, 1]


please let me know if my understanding is correct.

This is 10*5 nested cross validation. Inner folds CV over training data
involves a grid search over hyperparameters and outer folds evaluate the
performance.


Thanks,
Amita--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
Алексей Драль
2016-05-12 20:16:16 UTC
Permalink
Hi Amita,

As far as I understand your question, you only need one CV loop to optimize
your objective with scoring function provided:

===
pipeline=Pipeline([('scale', preprocessing.StandardScaler()),('filter',
SelectKBest(f_regression)),('svr', svm.SVR())]
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
param_grid=[{'svr__kernel': ['rbf'], 'svr__gamma': gamma_range,'svr__C':
C_range}]
grid_search = GridSearchCV(pipeline, param_grid=param_grid, cv=5*,
scoring=scoring_function*)
grid_search.fit(X_train, Y_train)
===

More details about it you should be able to find in documentation:

- http://scikit-learn.org/stable/modules/grid_search.html#grid-search
-
http://scikit-learn.org/stable/modules/grid_search.html#gridsearch-scoring
Post by Amita Misra
Hi,
I have a limited dataset and hence want to learn the parameters and also
evaluate the final model.
From the documents it looks that nested cross validation is the way to do
it. I have the code but still I want to be sure that I am not overfitting
any way.
pipeline=Pipeline([('scale', preprocessing.StandardScaler()),('filter',
SelectKBest(f_regression)),('svr', svm.SVR())]
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
C_range}]
grid_search = GridSearchCV(pipeline, param_grid=param_grid,cv=5)
Y_pred=cross_validation.cross_val_predict(grid_search, X_train,
Y_train,cv=10)
correlation= numpy.ma.corrcoef(Y_train,Y_pred)[0, 1]
please let me know if my understanding is correct.
This is 10*5 nested cross validation. Inner folds CV over training data
involves a grid search over hyperparameters and outer folds evaluate the
performance.
Thanks,
Amita--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data
untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Yours sincerely,
Alexey A. Dral
Sebastian Raschka
2016-05-12 20:24:28 UTC
Permalink
I would say there are 2 different applications of nested CV. You could use it for algorithm selection (with hyperparam tuning in the inner loop). Or, you could use it as an estimate of the generalization performance (only hyperparam tuning), which has been reported to be less biased than the a k-fold CV estimate (Varma, S., & Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics, 7, 91. http://doi.org/10.1186/1471-2105-7-91)

By "you could use it as an estimate of the generalization performance (only hyperparam tuning)” I mean as a replacement for k-fold on the training set and evaluation on an independent test set.
Post by Алексей Драль
Hi Amita,
===
pipeline=Pipeline([('scale', preprocessing.StandardScaler()),('filter', SelectKBest(f_regression)),('svr', svm.SVR())]
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
param_grid=[{'svr__kernel': ['rbf'], 'svr__gamma': gamma_range,'svr__C': C_range}]
grid_search = GridSearchCV(pipeline, param_grid=param_grid, cv=5, scoring=scoring_function)
grid_search.fit(X_train, Y_train)
===
• http://scikit-learn.org/stable/modules/grid_search.html#grid-search
• http://scikit-learn.org/stable/modules/grid_search.html#gridsearch-scoring
Hi,
I have a limited dataset and hence want to learn the parameters and also evaluate the final model.
From the documents it looks that nested cross validation is the way to do it. I have the code but still I want to be sure that I am not overfitting any way.
pipeline=Pipeline([('scale', preprocessing.StandardScaler()),('filter', SelectKBest(f_regression)),('svr', svm.SVR())]
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
param_grid=[{'svr__kernel': ['rbf'], 'svr__gamma': gamma_range,'svr__C': C_range}]
grid_search = GridSearchCV(pipeline, param_grid=param_grid,cv=5) Y_pred=cross_validation.cross_val_predict(grid_search, X_train, Y_train,cv=10)
correlation= numpy.ma.corrcoef(Y_train,Y_pred)[0, 1]
please let me know if my understanding is correct.
This is 10*5 nested cross validation. Inner folds CV over training data involves a grid search over hyperparameters and outer folds evaluate the performance.
Thanks,
Amita--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Yours sincerely,
Alexey A. Dral
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Amita Misra
2016-05-12 20:50:34 UTC
Permalink
Actually I do not have an independent test set and hence I want to use it
as an estimate for generalization performance. Hence my classifier is fixed
SVM and I want to learn the parameters and also estimate an unbiased
performance using only one set of data.

I wanted to ensure that my code correctly does a nested 10*5 CV and the
parameters are learnt on a different set and final evaluation to get the
predicted score is on a different set.

Amita
Post by Sebastian Raschka
I would say there are 2 different applications of nested CV. You could use
it for algorithm selection (with hyperparam tuning in the inner loop). Or,
you could use it as an estimate of the generalization performance (only
hyperparam tuning), which has been reported to be less biased than the a
k-fold CV estimate (Varma, S., & Simon, R. (2006). Bias in error estimation
when using cross-validation for model selection. BMC Bioinformatics, 7, 91.
http://doi.org/10.1186/1471-2105-7-91)
By "you could use it as an estimate of the generalization performance
(only hyperparam tuning)” I mean as a replacement for k-fold on the
training set and evaluation on an independent test set.
Post by Алексей Драль
Hi Amita,
As far as I understand your question, you only need one CV loop to
===
pipeline=Pipeline([('scale', preprocessing.StandardScaler()),('filter',
SelectKBest(f_regression)),('svr', svm.SVR())]
Post by Алексей Драль
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
C_range}]
Post by Алексей Драль
grid_search = GridSearchCV(pipeline, param_grid=param_grid, cv=5,
scoring=scoring_function)
Post by Алексей Драль
grid_search.fit(X_train, Y_train)
===
•
http://scikit-learn.org/stable/modules/grid_search.html#grid-search
Post by Алексей Драль
•
http://scikit-learn.org/stable/modules/grid_search.html#gridsearch-scoring
Post by Алексей Драль
Hi,
I have a limited dataset and hence want to learn the parameters and
also evaluate the final model.
Post by Алексей Драль
From the documents it looks that nested cross validation is the way to
do it. I have the code but still I want to be sure that I am not
overfitting any way.
Post by Алексей Драль
pipeline=Pipeline([('scale', preprocessing.StandardScaler()),('filter',
SelectKBest(f_regression)),('svr', svm.SVR())]
Post by Алексей Драль
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
C_range}]
Post by Алексей Драль
grid_search = GridSearchCV(pipeline, param_grid=param_grid,cv=5)
Y_pred=cross_validation.cross_val_predict(grid_search, X_train,
Y_train,cv=10)
Post by Алексей Драль
correlation= numpy.ma.corrcoef(Y_train,Y_pred)[0, 1]
please let me know if my understanding is correct.
This is 10*5 nested cross validation. Inner folds CV over training data
involves a grid search over hyperparameters and outer folds evaluate the
performance.
Post by Алексей Драль
Thanks,
Amita--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Post by Алексей Драль
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data
untouched!
Post by Алексей Драль
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Yours sincerely,
Alexey A. Dral
------------------------------------------------------------------------------
Post by Алексей Драль
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data
untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Post by Алексей Драль
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
Sebastian Raschka
2016-05-12 21:43:25 UTC
Permalink
I see; that’s what I thought. At first glance, the approach (code) looks correct to me but I haven’ t done it this way, yet. Typically, I use a more “manual” approach iterating over the outer folds manually (since I typically use nested CV for algo selection):


gs_est = … your gridsearch, pipeline, estimator with param grid and cv=5
skfold = StratifiedKFold(y=y_train, n_folds=5, shuffle=True, random_state=123)

for outer_train_idx, outer_valid_idx in skfold:
gs_est.fit(X_train[outer_train_idx], y_train[outer_train_idx])
y_pred = gs_est.predict(X_train[outer_valid_idx])
acc = accuracy_score(y_true=y_train[outer_valid_idx], y_pred=y_pred)
print(' | inner ACC %.2f%% | outer ACC %.2f%%' % (gs_est.best_score_ * 100, acc * 100))
cv_scores[name].append(acc)

However, it should essentially do the same thing as your code if I see it correctly.
Actually I do not have an independent test set and hence I want to use it as an estimate for generalization performance. Hence my classifier is fixed SVM and I want to learn the parameters and also estimate an unbiased performance using only one set of data.
I wanted to ensure that my code correctly does a nested 10*5 CV and the parameters are learnt on a different set and final evaluation to get the predicted score is on a different set.
Amita
I would say there are 2 different applications of nested CV. You could use it for algorithm selection (with hyperparam tuning in the inner loop). Or, you could use it as an estimate of the generalization performance (only hyperparam tuning), which has been reported to be less biased than the a k-fold CV estimate (Varma, S., & Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics, 7, 91. http://doi.org/10.1186/1471-2105-7-91)
By "you could use it as an estimate of the generalization performance (only hyperparam tuning)” I mean as a replacement for k-fold on the training set and evaluation on an independent test set.
Post by Алексей Драль
Hi Amita,
===
pipeline=Pipeline([('scale', preprocessing.StandardScaler()),('filter', SelectKBest(f_regression)),('svr', svm.SVR())]
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
param_grid=[{'svr__kernel': ['rbf'], 'svr__gamma': gamma_range,'svr__C': C_range}]
grid_search = GridSearchCV(pipeline, param_grid=param_grid, cv=5, scoring=scoring_function)
grid_search.fit(X_train, Y_train)
===
• http://scikit-learn.org/stable/modules/grid_search.html#grid-search
• http://scikit-learn.org/stable/modules/grid_search.html#gridsearch-scoring
Hi,
I have a limited dataset and hence want to learn the parameters and also evaluate the final model.
From the documents it looks that nested cross validation is the way to do it. I have the code but still I want to be sure that I am not overfitting any way.
pipeline=Pipeline([('scale', preprocessing.StandardScaler()),('filter', SelectKBest(f_regression)),('svr', svm.SVR())]
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
param_grid=[{'svr__kernel': ['rbf'], 'svr__gamma': gamma_range,'svr__C': C_range}]
grid_search = GridSearchCV(pipeline, param_grid=param_grid,cv=5) Y_pred=cross_validation.cross_val_predict(grid_search, X_train, Y_train,cv=10)
correlation= numpy.ma.corrcoef(Y_train,Y_pred)[0, 1]
please let me know if my understanding is correct.
This is 10*5 nested cross validation. Inner folds CV over training data involves a grid search over hyperparameters and outer folds evaluate the performance.
Thanks,
Amita--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Yours sincerely,
Alexey A. Dral
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Amita Misra
2016-05-12 23:17:59 UTC
Permalink
Thanks.
Actually there were 2 people running the same experiments and the other
person was doing as you have shown above.
We were getting the same results but since methods were different I wanted
to ensure that I am doing it the right way.

Thanks,
Amita
I see; that’s what I thought. At first glance, the approach (code) looks
correct to me but I haven’ t done it this way, yet. Typically, I use a more
“manual” approach iterating over the outer folds manually (since I
gs_est = 
 your gridsearch, pipeline, estimator with param grid and cv=5
skfold = StratifiedKFold(y=y_train, n_folds=5, shuffle=True,
random_state=123)
gs_est.fit(X_train[outer_train_idx], y_train[outer_train_idx])
y_pred = gs_est.predict(X_train[outer_valid_idx])
acc = accuracy_score(y_true=y_train[outer_valid_idx], y_pred=y_pred)
print(' | inner ACC %.2f%% | outer ACC %.2f%%' %
(gs_est.best_score_ * 100, acc * 100))
cv_scores[name].append(acc)
However, it should essentially do the same thing as your code if I see it correctly.
Post by Amita Misra
Actually I do not have an independent test set and hence I want to use
it as an estimate for generalization performance. Hence my classifier is
fixed SVM and I want to learn the parameters and also estimate an unbiased
performance using only one set of data.
Post by Amita Misra
I wanted to ensure that my code correctly does a nested 10*5 CV and the
parameters are learnt on a different set and final evaluation to get the
predicted score is on a different set.
Post by Amita Misra
Amita
I would say there are 2 different applications of nested CV. You could
use it for algorithm selection (with hyperparam tuning in the inner loop).
Or, you could use it as an estimate of the generalization performance (only
hyperparam tuning), which has been reported to be less biased than the a
k-fold CV estimate (Varma, S., & Simon, R. (2006). Bias in error estimation
when using cross-validation for model selection. BMC Bioinformatics, 7, 91.
http://doi.org/10.1186/1471-2105-7-91)
Post by Amita Misra
By "you could use it as an estimate of the generalization performance
(only hyperparam tuning)” I mean as a replacement for k-fold on the
training set and evaluation on an independent test set.
Post by Amita Misra
Post by Алексей Драль
Hi Amita,
As far as I understand your question, you only need one CV loop to
===
pipeline=Pipeline([('scale',
preprocessing.StandardScaler()),('filter',
SelectKBest(f_regression)),('svr', svm.SVR())]
Post by Amita Misra
Post by Алексей Драль
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
gamma_range,'svr__C': C_range}]
Post by Amita Misra
Post by Алексей Драль
grid_search = GridSearchCV(pipeline, param_grid=param_grid, cv=5,
scoring=scoring_function)
Post by Amita Misra
Post by Алексей Драль
grid_search.fit(X_train, Y_train)
===
•
http://scikit-learn.org/stable/modules/grid_search.html#grid-search
Post by Amita Misra
Post by Алексей Драль
•
http://scikit-learn.org/stable/modules/grid_search.html#gridsearch-scoring
Post by Amita Misra
Post by Алексей Драль
Hi,
I have a limited dataset and hence want to learn the parameters and
also evaluate the final model.
Post by Amita Misra
Post by Алексей Драль
From the documents it looks that nested cross validation is the way to
do it. I have the code but still I want to be sure that I am not
overfitting any way.
Post by Amita Misra
Post by Алексей Драль
pipeline=Pipeline([('scale',
preprocessing.StandardScaler()),('filter',
SelectKBest(f_regression)),('svr', svm.SVR())]
Post by Amita Misra
Post by Алексей Драль
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
gamma_range,'svr__C': C_range}]
Post by Amita Misra
Post by Алексей Драль
grid_search = GridSearchCV(pipeline, param_grid=param_grid,cv=5)
Y_pred=cross_validation.cross_val_predict(grid_search, X_train,
Y_train,cv=10)
Post by Amita Misra
Post by Алексей Драль
correlation= numpy.ma.corrcoef(Y_train,Y_pred)[0, 1]
please let me know if my understanding is correct.
This is 10*5 nested cross validation. Inner folds CV over training
data involves a grid search over hyperparameters and outer folds evaluate
the performance.
Post by Amita Misra
Post by Алексей Драль
Thanks,
Amita--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Post by Amita Misra
Post by Алексей Драль
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of
MDM
Post by Amita Misra
Post by Алексей Драль
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data
untouched!
Post by Amita Misra
Post by Алексей Драль
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Yours sincerely,
Alexey A. Dral
------------------------------------------------------------------------------
Post by Amita Misra
Post by Алексей Драль
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of
MDM
Post by Amita Misra
Post by Алексей Драль
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data
untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Post by Amita Misra
Post by Алексей Драль
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Amita Misra
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data
untouched!
Post by Amita Misra
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Post by Amita Misra
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data
untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Post by Amita Misra
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
Sebastian Raschka
2016-05-13 01:58:08 UTC
Permalink
You are welcome, and I am glad to hear that it works :). And “your" approach is definitely the cleaner way to do it … I think you just need to be a bit careful about the n_jobs parameter in practice, I would only set it to n_jobs=-1 in the inner loop.

Best,
Sebastian
Post by Amita Misra
Thanks.
Actually there were 2 people running the same experiments and the other person was doing as you have shown above.
We were getting the same results but since methods were different I wanted to ensure that I am doing it the right way.
Thanks,
Amita
gs_est = … your gridsearch, pipeline, estimator with param grid and cv=5
skfold = StratifiedKFold(y=y_train, n_folds=5, shuffle=True, random_state=123)
gs_est.fit(X_train[outer_train_idx], y_train[outer_train_idx])
y_pred = gs_est.predict(X_train[outer_valid_idx])
acc = accuracy_score(y_true=y_train[outer_valid_idx], y_pred=y_pred)
print(' | inner ACC %.2f%% | outer ACC %.2f%%' % (gs_est.best_score_ * 100, acc * 100))
cv_scores[name].append(acc)
However, it should essentially do the same thing as your code if I see it correctly.
Actually I do not have an independent test set and hence I want to use it as an estimate for generalization performance. Hence my classifier is fixed SVM and I want to learn the parameters and also estimate an unbiased performance using only one set of data.
I wanted to ensure that my code correctly does a nested 10*5 CV and the parameters are learnt on a different set and final evaluation to get the predicted score is on a different set.
Amita
I would say there are 2 different applications of nested CV. You could use it for algorithm selection (with hyperparam tuning in the inner loop). Or, you could use it as an estimate of the generalization performance (only hyperparam tuning), which has been reported to be less biased than the a k-fold CV estimate (Varma, S., & Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics, 7, 91. http://doi.org/10.1186/1471-2105-7-91)
By "you could use it as an estimate of the generalization performance (only hyperparam tuning)” I mean as a replacement for k-fold on the training set and evaluation on an independent test set.
Post by Алексей Драль
Hi Amita,
===
pipeline=Pipeline([('scale', preprocessing.StandardScaler()),('filter', SelectKBest(f_regression)),('svr', svm.SVR())]
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
param_grid=[{'svr__kernel': ['rbf'], 'svr__gamma': gamma_range,'svr__C': C_range}]
grid_search = GridSearchCV(pipeline, param_grid=param_grid, cv=5, scoring=scoring_function)
grid_search.fit(X_train, Y_train)
===
• http://scikit-learn.org/stable/modules/grid_search.html#grid-search
• http://scikit-learn.org/stable/modules/grid_search.html#gridsearch-scoring
Hi,
I have a limited dataset and hence want to learn the parameters and also evaluate the final model.
From the documents it looks that nested cross validation is the way to do it. I have the code but still I want to be sure that I am not overfitting any way.
pipeline=Pipeline([('scale', preprocessing.StandardScaler()),('filter', SelectKBest(f_regression)),('svr', svm.SVR())]
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
param_grid=[{'svr__kernel': ['rbf'], 'svr__gamma': gamma_range,'svr__C': C_range}]
grid_search = GridSearchCV(pipeline, param_grid=param_grid,cv=5) Y_pred=cross_validation.cross_val_predict(grid_search, X_train, Y_train,cv=10)
correlation= numpy.ma.corrcoef(Y_train,Y_pred)[0, 1]
please let me know if my understanding is correct.
This is 10*5 nested cross validation. Inner folds CV over training data involves a grid search over hyperparameters and outer folds evaluate the performance.
Thanks,
Amita--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Yours sincerely,
Alexey A. Dral
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Amita Misra
2016-05-13 02:26:18 UTC
Permalink
I had not thought about the n_jobs parameter, mainly because it does not
run on my mac and the system just hangs if i use it.
The same code runs on linux server though.

I have one more clarification to seek.
I was running it on server with this code. Would this be fine or may I move
the n_jobs=3 to GridSearchCV

grid_search = GridSearchCV(pipeline,
param_grid=param_grid,scoring=scoringcriteria,cv=5)
scores = cross_validation.cross_val_score(grid_search, X_train,
Y_train,cv=cvfolds,n_jobs=3)

Thanks,
Amita
You are welcome, and I am glad to hear that it works :). And “your"
approach is definitely the cleaner way to do it 
 I think you just need to
be a bit careful about the n_jobs parameter in practice, I would only set
it to n_jobs=-1 in the inner loop.
Best,
Sebastian
Post by Amita Misra
Thanks.
Actually there were 2 people running the same experiments and the other
person was doing as you have shown above.
Post by Amita Misra
We were getting the same results but since methods were different I
wanted to ensure that I am doing it the right way.
Post by Amita Misra
Thanks,
Amita
I see; that’s what I thought. At first glance, the approach (code) looks
correct to me but I haven’ t done it this way, yet. Typically, I use a more
“manual” approach iterating over the outer folds manually (since I
Post by Amita Misra
gs_est = 
 your gridsearch, pipeline, estimator with param grid and cv=5
skfold = StratifiedKFold(y=y_train, n_folds=5, shuffle=True,
random_state=123)
Post by Amita Misra
gs_est.fit(X_train[outer_train_idx], y_train[outer_train_idx])
y_pred = gs_est.predict(X_train[outer_valid_idx])
acc = accuracy_score(y_true=y_train[outer_valid_idx],
y_pred=y_pred)
Post by Amita Misra
print(' | inner ACC %.2f%% | outer ACC %.2f%%' %
(gs_est.best_score_ * 100, acc * 100))
Post by Amita Misra
cv_scores[name].append(acc)
However, it should essentially do the same thing as your code if I see
it correctly.
Post by Amita Misra
Post by Amita Misra
Actually I do not have an independent test set and hence I want to use
it as an estimate for generalization performance. Hence my classifier is
fixed SVM and I want to learn the parameters and also estimate an unbiased
performance using only one set of data.
Post by Amita Misra
Post by Amita Misra
I wanted to ensure that my code correctly does a nested 10*5 CV and
the parameters are learnt on a different set and final evaluation to get
the predicted score is on a different set.
Post by Amita Misra
Post by Amita Misra
Amita
On Thu, May 12, 2016 at 1:24 PM, Sebastian Raschka <
I would say there are 2 different applications of nested CV. You could
use it for algorithm selection (with hyperparam tuning in the inner loop).
Or, you could use it as an estimate of the generalization performance (only
hyperparam tuning), which has been reported to be less biased than the a
k-fold CV estimate (Varma, S., & Simon, R. (2006). Bias in error estimation
when using cross-validation for model selection. BMC Bioinformatics, 7, 91.
http://doi.org/10.1186/1471-2105-7-91)
Post by Amita Misra
Post by Amita Misra
By "you could use it as an estimate of the generalization performance
(only hyperparam tuning)” I mean as a replacement for k-fold on the
training set and evaluation on an independent test set.
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
Hi Amita,
As far as I understand your question, you only need one CV loop to
===
pipeline=Pipeline([('scale',
preprocessing.StandardScaler()),('filter',
SelectKBest(f_regression)),('svr', svm.SVR())]
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
gamma_range,'svr__C': C_range}]
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
grid_search = GridSearchCV(pipeline, param_grid=param_grid, cv=5,
scoring=scoring_function)
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
grid_search.fit(X_train, Y_train)
===
•
http://scikit-learn.org/stable/modules/grid_search.html#grid-search
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
•
http://scikit-learn.org/stable/modules/grid_search.html#gridsearch-scoring
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
Hi,
I have a limited dataset and hence want to learn the parameters and
also evaluate the final model.
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
From the documents it looks that nested cross validation is the way
to do it. I have the code but still I want to be sure that I am not
overfitting any way.
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
pipeline=Pipeline([('scale',
preprocessing.StandardScaler()),('filter',
SelectKBest(f_regression)),('svr', svm.SVR())]
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
gamma_range,'svr__C': C_range}]
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
grid_search = GridSearchCV(pipeline, param_grid=param_grid,cv=5)
Y_pred=cross_validation.cross_val_predict(grid_search, X_train,
Y_train,cv=10)
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
correlation= numpy.ma.corrcoef(Y_train,Y_pred)[0, 1]
please let me know if my understanding is correct.
This is 10*5 nested cross validation. Inner folds CV over training
data involves a grid search over hyperparameters and outer folds evaluate
the performance.
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
Thanks,
Amita--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
Mobile security can be enabling, not merely restricting. Employees
who
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
bring their own devices (BYOD) to work are irked by the imposition
of MDM
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
restrictions. Mobile Device Manager Plus allows you to control only
the
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
apps on BYO-devices by containerizing them, leaving personal data
untouched!
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Yours sincerely,
Alexey A. Dral
------------------------------------------------------------------------------
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
Mobile security can be enabling, not merely restricting. Employees
who
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
bring their own devices (BYOD) to work are irked by the imposition
of MDM
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
restrictions. Mobile Device Manager Plus allows you to control only
the
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
apps on BYO-devices by containerizing them, leaving personal data
untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Amita Misra
Post by Amita Misra
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of
MDM
Post by Amita Misra
Post by Amita Misra
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data
untouched!
Post by Amita Misra
Post by Amita Misra
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Post by Amita Misra
Post by Amita Misra
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of
MDM
Post by Amita Misra
Post by Amita Misra
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data
untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Post by Amita Misra
Post by Amita Misra
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Amita Misra
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data
untouched!
Post by Amita Misra
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Post by Amita Misra
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data
untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Post by Amita Misra
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
Sebastian Raschka
2016-05-13 02:35:02 UTC
Permalink
I am not that much into the multi-processing implementation in scikit-learn / joblib, but I think this could be one issue why your mac hangs… I’d say that it’s probably the safest approach to only set the n_jobs parameter for the innermost object.

E.g., if you 4 processors, you said the GridSearch to 2 and a k-fold loop to e.g., 5, I can imagine that it would blow up because you are suddenly trying to run 10 processes on 4 processors if it makes sense!?
I had not thought about the n_jobs parameter, mainly because it does not run on my mac and the system just hangs if i use it.
The same code runs on linux server though.
I have one more clarification to seek.
I was running it on server with this code. Would this be fine or may I move the n_jobs=3 to GridSearchCV
grid_search = GridSearchCV(pipeline, param_grid=param_grid,scoring=scoringcriteria,cv=5)
scores = cross_validation.cross_val_score(grid_search, X_train, Y_train,cv=cvfolds,n_jobs=3)
Thanks,
Amita
You are welcome, and I am glad to hear that it works :). And “your" approach is definitely the cleaner way to do it … I think you just need to be a bit careful about the n_jobs parameter in practice, I would only set it to n_jobs=-1 in the inner loop.
Best,
Sebastian
Post by Amita Misra
Thanks.
Actually there were 2 people running the same experiments and the other person was doing as you have shown above.
We were getting the same results but since methods were different I wanted to ensure that I am doing it the right way.
Thanks,
Amita
gs_est = … your gridsearch, pipeline, estimator with param grid and cv=5
skfold = StratifiedKFold(y=y_train, n_folds=5, shuffle=True, random_state=123)
gs_est.fit(X_train[outer_train_idx], y_train[outer_train_idx])
y_pred = gs_est.predict(X_train[outer_valid_idx])
acc = accuracy_score(y_true=y_train[outer_valid_idx], y_pred=y_pred)
print(' | inner ACC %.2f%% | outer ACC %.2f%%' % (gs_est.best_score_ * 100, acc * 100))
cv_scores[name].append(acc)
However, it should essentially do the same thing as your code if I see it correctly.
Actually I do not have an independent test set and hence I want to use it as an estimate for generalization performance. Hence my classifier is fixed SVM and I want to learn the parameters and also estimate an unbiased performance using only one set of data.
I wanted to ensure that my code correctly does a nested 10*5 CV and the parameters are learnt on a different set and final evaluation to get the predicted score is on a different set.
Amita
I would say there are 2 different applications of nested CV. You could use it for algorithm selection (with hyperparam tuning in the inner loop). Or, you could use it as an estimate of the generalization performance (only hyperparam tuning), which has been reported to be less biased than the a k-fold CV estimate (Varma, S., & Simon, R. (2006). Bias in error estimation when using cross-validation for model selection. BMC Bioinformatics, 7, 91. http://doi.org/10.1186/1471-2105-7-91)
By "you could use it as an estimate of the generalization performance (only hyperparam tuning)” I mean as a replacement for k-fold on the training set and evaluation on an independent test set.
Post by Алексей Драль
Hi Amita,
===
pipeline=Pipeline([('scale', preprocessing.StandardScaler()),('filter', SelectKBest(f_regression)),('svr', svm.SVR())]
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
param_grid=[{'svr__kernel': ['rbf'], 'svr__gamma': gamma_range,'svr__C': C_range}]
grid_search = GridSearchCV(pipeline, param_grid=param_grid, cv=5, scoring=scoring_function)
grid_search.fit(X_train, Y_train)
===
• http://scikit-learn.org/stable/modules/grid_search.html#grid-search
• http://scikit-learn.org/stable/modules/grid_search.html#gridsearch-scoring
Hi,
I have a limited dataset and hence want to learn the parameters and also evaluate the final model.
From the documents it looks that nested cross validation is the way to do it. I have the code but still I want to be sure that I am not overfitting any way.
pipeline=Pipeline([('scale', preprocessing.StandardScaler()),('filter', SelectKBest(f_regression)),('svr', svm.SVR())]
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
param_grid=[{'svr__kernel': ['rbf'], 'svr__gamma': gamma_range,'svr__C': C_range}]
grid_search = GridSearchCV(pipeline, param_grid=param_grid,cv=5) Y_pred=cross_validation.cross_val_predict(grid_search, X_train, Y_train,cv=10)
correlation= numpy.ma.corrcoef(Y_train,Y_pred)[0, 1]
please let me know if my understanding is correct.
This is 10*5 nested cross validation. Inner folds CV over training data involves a grid search over hyperparameters and outer folds evaluate the performance.
Thanks,
Amita--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Yours sincerely,
Alexey A. Dral
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Amita Misra
2016-05-13 02:47:06 UTC
Permalink
Oh yes, I get that now. All this while I was thinking there was an issue
with the mac due to a similar issue discussed here
https://github.com/scikit-learn/scikit-learn/issues/5115.

Thanks a lot for clearing this up. I am going to change the loop and see
if I can run the parallel implementation on mac.
It was probably running on server since its has many more processors..

Thanks,
Amita
Post by Sebastian Raschka
I am not that much into the multi-processing implementation in
scikit-learn / joblib, but I think this could be one issue why your mac
hangs
 I’d say that it’s probably the safest approach to only set the
n_jobs parameter for the innermost object.
E.g., if you 4 processors, you said the GridSearch to 2 and a k-fold loop
to e.g., 5, I can imagine that it would blow up because you are suddenly
trying to run 10 processes on 4 processors if it makes sense!?
Post by Amita Misra
I had not thought about the n_jobs parameter, mainly because it does not
run on my mac and the system just hangs if i use it.
Post by Amita Misra
The same code runs on linux server though.
I have one more clarification to seek.
I was running it on server with this code. Would this be fine or may I
move the n_jobs=3 to GridSearchCV
Post by Amita Misra
grid_search = GridSearchCV(pipeline,
param_grid=param_grid,scoring=scoringcriteria,cv=5)
Post by Amita Misra
scores = cross_validation.cross_val_score(grid_search, X_train,
Y_train,cv=cvfolds,n_jobs=3)
Post by Amita Misra
Thanks,
Amita
You are welcome, and I am glad to hear that it works :). And “your"
approach is definitely the cleaner way to do it 
 I think you just need to
be a bit careful about the n_jobs parameter in practice, I would only set
it to n_jobs=-1 in the inner loop.
Post by Amita Misra
Best,
Sebastian
Post by Amita Misra
Thanks.
Actually there were 2 people running the same experiments and the
other person was doing as you have shown above.
Post by Amita Misra
Post by Amita Misra
We were getting the same results but since methods were different I
wanted to ensure that I am doing it the right way.
Post by Amita Misra
Post by Amita Misra
Thanks,
Amita
On Thu, May 12, 2016 at 2:43 PM, Sebastian Raschka <
I see; that’s what I thought. At first glance, the approach (code)
looks correct to me but I haven’ t done it this way, yet. Typically, I use
a more “manual” approach iterating over the outer folds manually (since I
Post by Amita Misra
Post by Amita Misra
gs_est = 
 your gridsearch, pipeline, estimator with param grid and
cv=5
Post by Amita Misra
Post by Amita Misra
skfold = StratifiedKFold(y=y_train, n_folds=5, shuffle=True,
random_state=123)
Post by Amita Misra
Post by Amita Misra
gs_est.fit(X_train[outer_train_idx], y_train[outer_train_idx])
y_pred = gs_est.predict(X_train[outer_valid_idx])
acc = accuracy_score(y_true=y_train[outer_valid_idx],
y_pred=y_pred)
Post by Amita Misra
Post by Amita Misra
print(' | inner ACC %.2f%% | outer ACC %.2f%%' %
(gs_est.best_score_ * 100, acc * 100))
Post by Amita Misra
Post by Amita Misra
cv_scores[name].append(acc)
However, it should essentially do the same thing as your code if I see
it correctly.
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Actually I do not have an independent test set and hence I want to
use it as an estimate for generalization performance. Hence my classifier
is fixed SVM and I want to learn the parameters and also estimate an
unbiased performance using only one set of data.
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
I wanted to ensure that my code correctly does a nested 10*5 CV and
the parameters are learnt on a different set and final evaluation to get
the predicted score is on a different set.
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Amita
On Thu, May 12, 2016 at 1:24 PM, Sebastian Raschka <
I would say there are 2 different applications of nested CV. You
could use it for algorithm selection (with hyperparam tuning in the inner
loop). Or, you could use it as an estimate of the generalization
performance (only hyperparam tuning), which has been reported to be less
biased than the a k-fold CV estimate (Varma, S., & Simon, R. (2006). Bias
in error estimation when using cross-validation for model selection. BMC
Bioinformatics, 7, 91. http://doi.org/10.1186/1471-2105-7-91)
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
By "you could use it as an estimate of the generalization
performance (only hyperparam tuning)” I mean as a replacement for k-fold on
the training set and evaluation on an independent test set.
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
Hi Amita,
As far as I understand your question, you only need one CV loop to
===
pipeline=Pipeline([('scale',
preprocessing.StandardScaler()),('filter',
SelectKBest(f_regression)),('svr', svm.SVR())]
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
gamma_range,'svr__C': C_range}]
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
grid_search = GridSearchCV(pipeline, param_grid=param_grid, cv=5,
scoring=scoring_function)
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
grid_search.fit(X_train, Y_train)
===
•
http://scikit-learn.org/stable/modules/grid_search.html#grid-search
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
•
http://scikit-learn.org/stable/modules/grid_search.html#gridsearch-scoring
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
Hi,
I have a limited dataset and hence want to learn the parameters
and also evaluate the final model.
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
From the documents it looks that nested cross validation is the
way to do it. I have the code but still I want to be sure that I am not
overfitting any way.
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
pipeline=Pipeline([('scale',
preprocessing.StandardScaler()),('filter',
SelectKBest(f_regression)),('svr', svm.SVR())]
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
C_range = [0.1, 1, 10, 100]
gamma_range=numpy.logspace(-2, 2, 5)
gamma_range,'svr__C': C_range}]
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
grid_search = GridSearchCV(pipeline, param_grid=param_grid,cv=5)
Y_pred=cross_validation.cross_val_predict(grid_search, X_train,
Y_train,cv=10)
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
correlation= numpy.ma.corrcoef(Y_train,Y_pred)[0, 1]
please let me know if my understanding is correct.
This is 10*5 nested cross validation. Inner folds CV over training
data involves a grid search over hyperparameters and outer folds evaluate
the performance.
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
Thanks,
Amita--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
Mobile security can be enabling, not merely restricting. Employees
who
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
bring their own devices (BYOD) to work are irked by the imposition
of MDM
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
restrictions. Mobile Device Manager Plus allows you to control
only the
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
apps on BYO-devices by containerizing them, leaving personal data
untouched!
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Yours sincerely,
Alexey A. Dral
------------------------------------------------------------------------------
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
Mobile security can be enabling, not merely restricting. Employees
who
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
bring their own devices (BYOD) to work are irked by the imposition
of MDM
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
restrictions. Mobile Device Manager Plus allows you to control
only the
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
apps on BYO-devices by containerizing them, leaving personal data
untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Post by Алексей Драль
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Mobile security can be enabling, not merely restricting. Employees
who
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
bring their own devices (BYOD) to work are irked by the imposition
of MDM
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
restrictions. Mobile Device Manager Plus allows you to control only
the
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
apps on BYO-devices by containerizing them, leaving personal data
untouched!
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Mobile security can be enabling, not merely restricting. Employees
who
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
bring their own devices (BYOD) to work are irked by the imposition
of MDM
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
restrictions. Mobile Device Manager Plus allows you to control only
the
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
apps on BYO-devices by containerizing them, leaving personal data
untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Post by Amita Misra
Post by Amita Misra
Post by Amita Misra
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Amita Misra
Post by Amita Misra
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of
MDM
Post by Amita Misra
Post by Amita Misra
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data
untouched!
Post by Amita Misra
Post by Amita Misra
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Post by Amita Misra
Post by Amita Misra
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of
MDM
Post by Amita Misra
Post by Amita Misra
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data
untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Post by Amita Misra
Post by Amita Misra
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Amita Misra
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data
untouched!
Post by Amita Misra
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Post by Amita Misra
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data
untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j_______________________________________________
Post by Amita Misra
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Mobile security can be enabling, not merely restricting. Employees who
bring their own devices (BYOD) to work are irked by the imposition of MDM
restrictions. Mobile Device Manager Plus allows you to control only the
apps on BYO-devices by containerizing them, leaving personal data untouched!
https://ad.doubleclick.net/ddm/clk/304595813;131938128;j
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
Loading...