I do confirm that Lasso and LassoLars both minimize

and that the n should not be present in the sparse coding context.

is not correct. I don't know if this also affects the doc of the SGD.

(etc.) docstrings.

regarding the shapes using sparse_encode I'll let Vlad comment.

*Post by David Warde-Farley**Post by Olivier Grisel**Post by David Warde-Farley**Post by Alexandre Gramfort**Post by David Warde-Farley*This actually gets at something I've been meaning to fiddle with and report but haven't had time: I'm not sure I completely trust the coordinate descent implementation in scikit-learn, because it seems to give me bogus answers a lot (i.e., the optimality conditions necessary for it to be an actual solution are not even approximately satisfied). Are you guys using something weird for the termination condition?

can you give us a sample X and y that shows the pb?

it should ultimately use the duality gap to stop the iterations but

there might be a corner case …

In [34]: rng = np.random.RandomState(0)

In [35]: dictionary = rng.normal(size=(100, 500)) / 1000; dictionary /=

np.sqrt((dictionary ** 2).sum(axis=0))

In [36]: signal = rng.normal(size=100) / 1000

In [37]: from sklearn.linear_model import Lasso

In [38]: lasso = Lasso(alpha=0.0001, max_iter=1e6, fit_intercept=False,

tol=1e-8)

In [39]: lasso.fit(dictionary, signal)

Lasso(alpha=0.0001, copy_X=True, fit_intercept=False, max_iter=1000000.0,

normalize=False, precompute='auto', tol=1e-08)

In [40]: max(abs(lasso.coef_))

Out[40]: 0.0

In [41]: from pylearn2.optimization.feature_sign import feature_sign_search

In [42]: coef = feature_sign_search(dictionary, signal, 0.0001)

In [43]: max(abs(coef))

Out[43]: 0.0027295761244725018

And I'm pretty sure the latter result is the right one, since

....: gram = np.dot(dictionary.T, dictionary)

....: corr = np.dot(dictionary.T, signal)

....: return - 2 * corr + 2 * np.dot(gram, coefs) + 0.0001 *

np.sign(coefs)

Actually, alpha in scikit-learn is multiplied by n_samples. I agree

this is misleading and not documented in the docstring.

*Post by David Warde-Farley**Post by Alexandre Gramfort**Post by David Warde-Farley*lasso = Lasso(alpha=0.0001 / dictionary.shape[0], max_iter=1e6, fit_intercept=False, tol=1e-8).fit(dictionary, signal)

max(abs(lasso.coef_))

0.0027627270397484554

0.00019687294269977963

Seems like there's an added factor of 2 in there as well,

In [94]: lasso = Lasso(alpha=0.0001 / (2 * dictionary.shape[0]),

max_iter=1e8, fit_intercept=False, tol=1e-8).fit(dictionary, signal)

In [95]: coef = feature_sign_search(dictionary, signal, 0.0001)

In [96]: allclose(lasso.coef_, coef, atol=1e-7)

Out[96]: True

I think you're right that the precise cost function definitely ought to be

documented in the front-facing classes rather than just the low-level Cython

routines.

I also think that scaling the way Lasso/ElasticNet does in the context of

sparse coding may be very confusing, since in sparse coding it corresponds

not to a number of training samples in a regression problem but to the number

of input dimensions.

The docstring of sparse_encode is quite confusing in that X, the dictionary,

says "n_samples, n_components". The number of samples (in the context of

sparse coding) should have no influence over the shape of the dictionary;

this seems to have leaked over from the Lasso documentation.

The shape and mathematical definition of cov doesn't make much sense to me

given this change, though (or to begin with, for that matter): In the case of

a single problem, the desired covariance is X^T y, with y a column vector,

yielding another column vector of (n_components, 1). So the shape, if you

have multiple examples you're precomputing for, should end up being

(n_components, n_samples), and given the shape of Y that would be achieved by

X^T Y^T.

David

------------------------------------------------------------------------------

Cloud Services Checklist: Pricing and Packaging Optimization

This white paper is intended to serve as a reference, checklist and point of

discussion for anyone considering optimizing the pricing and packaging model

of a cloud services business. Read Now!

http://www.accelacomm.com/jaw/sfnl/114/51491232/

_______________________________________________

Scikit-learn-general mailing list

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general