Vincent Dubourg

2010-12-13 07:24:13 UTC

Hi list,

Alex (Gramfort) and I came up with the premise that the cross_val_score

function raises an inconsistency when the default regressors' score

function (r2_score) is coupled with the LeaveOneOut iterator. Indeed,

r2_score(y_true, y_pred) would return a "-Inf"-full array as it computes

the variance on a single sample for each fold, thus y_true -

y_true.mean() = 0 and finally R2 = -Inf due to the division by zero...

And we also came up with a solution, isn't it nice?!

Why not implementing a cross_val_predict function that would return an

array with shape (n_folds, folds_size) containing the cross_val

predictions of the estimator on the folds? Using this function in

conjunction with eg the LeaveOneOut iterator would allow one to:

- perform and retrieve an exhaustive list of leave-one-out predictions

to build an adequation plot: y_pred_on_folds vs. y_true;

- plus making a cross_val estimate of any score function in the metrics

module.

Vincent

Alex (Gramfort) and I came up with the premise that the cross_val_score

function raises an inconsistency when the default regressors' score

function (r2_score) is coupled with the LeaveOneOut iterator. Indeed,

r2_score(y_true, y_pred) would return a "-Inf"-full array as it computes

the variance on a single sample for each fold, thus y_true -

y_true.mean() = 0 and finally R2 = -Inf due to the division by zero...

And we also came up with a solution, isn't it nice?!

Why not implementing a cross_val_predict function that would return an

array with shape (n_folds, folds_size) containing the cross_val

predictions of the estimator on the folds? Using this function in

conjunction with eg the LeaveOneOut iterator would allow one to:

- perform and retrieve an exhaustive list of leave-one-out predictions

to build an adequation plot: y_pred_on_folds vs. y_true;

- plus making a cross_val estimate of any score function in the metrics

module.

from scikits.learn.gaussian_process import GaussianProcess

from scikits.learn.cross_val import cross_val_predict, LeaveOneOut

from scikits.learn.metrics import r2_score

gp = GaussianProcess()

gp.fit(X, y)

y_pred_on_folds = cross_val_predict(gp, X, y,

cv=LeaveOneOut(y.size), njobs=-1)from scikits.learn.cross_val import cross_val_predict, LeaveOneOut

from scikits.learn.metrics import r2_score

gp = GaussianProcess()

gp.fit(X, y)

y_pred_on_folds = cross_val_predict(gp, X, y,

R2 = r2_score(y_true, y_pred_on_folds)

Cheers,Vincent