*Post by Yaroslav Halchenko**Post by j***@gmail.com**Post by Yaroslav Halchenko**Post by j***@gmail.com**Post by Gael Varoquaux*I believe that this is a goal we had set ourselves. There have been a few

challenges to satisfying this goal, but I'd like to keep to it as much as

possible.

Mostly out of curiosity, because I haven't seen a strong case for this

in econometrics yet.

When you refit a model to a new dataset, what are you actually reusing?

shouldn't the answer be "nothing"? ;)

If you don't reuse anything, then why don't you just create a new

instance for the new data sets.

what about cross-validation and bootstrapping? e.g. user provides

existing instance (could be not yet trained of cause) and I need to

evaluate some measure on that model based on some sampling/splitting of

dataset at hands. It would be somewhat overkill to ask user to provide

instances for each split, or to do deepcopying of original instance for

each split. Natively, imho, it should be the same instance used over in

a loop.

my draft version for crossvalidation, that I wrote for example for

principal component regression creates anew instance and in this case

there is nothing to reuse for eg. leavePout

for OLS cross val or bootstrap, which would be easier:

def ols_bootstrap(endog, exog):

for i in boottrapiterator(n): #or i,j in crossval_iterator

res(i) = sm.OLS(endog[i], exog[i,:]).fit().params

sm.OLS(..) creates a new instance instead of wiping and reusing an

existing instance.

In statsmodels, most except for the minimum results are by now lazy,

and are not calculated until requested. I haven't tried yet to figure

out what "state" in pymvpa really does and I don't know what your

overhead for instance creation is. That's one of the reasons for my

initial question.

and our internal code for generalized linear models, which works on

iteratively updating the weights matrix has roughly a pattern like

(from memory)

while ... notfinished:

...

newres = sm.WLS(endog, exog, weights= newweights)

which creates a new WLS (weighted least squares) instance in each iteration.

I was initialy reluctant last year when Skipper and Alan proposed the

design change, whether a new instance is useful, but finally agreed

that since there is little additional cost besides instance creation,

and it makes a cleaner design.

*Post by Yaroslav Halchenko**Post by j***@gmail.com*With cross-validation and bootstrapping we are not far enough and

haven't formalized any structure yet, but I assume if creating a new

instance is too costly, then we will write model specific

bootstrapping and cross-validation code.

hm... if I got it right, sounds like overkill.

Somewhat related, a while ago I wrote a standard cusum test for

structural breaks, which is still standalone. this case requires

recursive residuals

this would roughly be

for i in range(start, nobs):

res(i) = y[i] - OLS(y[:i-1],x[:i-1,:],).fit().model.predict(x[i,:])

which is very inefficient in this case, and I wrote an online

estimator for the params, that updates on the inverse of X'X, which is

the standard approach for this case.

Josef

*Post by Yaroslav Halchenko*--

.-.

=------------------------------ /v\ ----------------------------=

Yaroslav Halchenko /( )\ ICQ#: 60653192

Linux User ^^-^^ [175555]

------------------------------------------------------------------------------

_______________________________________________

Scikit-learn-general mailing list

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general