because the L1 penalty tends to set more coefficients to zero than L2.

with a more parsimonious and interpretable model. I would suggest that the

advice above is a good rule of thumb but also a bit hand-wavy. In practice,

alpha is not nearly as sensitive as lambda (level of regularization). It

path of lambdas for each and choose the best model from these.

Confusingly, sklearn uses l1_ratio to mean alpha and alpha to mean lambda.

Send Scikit-learn-general mailing list submissions to

To subscribe or unsubscribe via the World Wide Web, visit

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

or, via email, send a message with subject or body 'help' to

You can reach the person managing the list at

When replying, please edit your Subject line so it is more specific

than "Re: Contents of Scikit-learn-general digest..."

1. Re: recommendation systems (Olivier Grisel)

2. Re: centering of sparse data for elastic net (James Jensen)

3. Re: choice of regularization parameter grid for elastic net

(James Jensen)

4. Re: centering of sparse data for elastic net (Lars Buitinck)

5. Re: choice of regularization parameter grid for elastic net

(Nicholas Dronen)

6. Contributing to scikit-learn (Ankit Agrawal)

----------------------------------------------------------------------

Message: 1

Date: Mon, 14 Oct 2013 12:05:24 +0200

Subject: Re: [Scikit-learn-general] recommendation systems

<

Content-Type: text/plain; charset=UTF-8

Actually the mrec implementation is not the original SLIM algorithm

http://slideshare.net/MarkLevy/efficient-slides

--

Olivier

------------------------------

Message: 2

Date: Mon, 14 Oct 2013 08:13:30 -0700

Subject: Re: [Scikit-learn-general] centering of sparse data for

elastic net

Content-Type: text/plain; charset="iso-8859-1"

Thank you, Olivier.

Just to clarify: you say

You can control the centering with `normalize=True` flag of the

ElasticNet class (or any other linear regression model).

I've noticed people use the term "normalize" in different ways. In the

case of the `normalize=True` flag of the linear models, does it mean

both scaling samples to have unit norm and centering them to have mean

zero? If so, this is inconsistent with the usage in, say, the

preprocessing module, where "normalization" refers only to scaling to

unit norm, and the word "standardization" is used to refer to doing both

(although the function to standardize is scale(), and "scale" seems more

naturally associated with normalization, in my mind). Because of this, I

had supposed that the `normalize=True` flag did not determine centering.

-------------- next part --------------

An HTML attachment was scrubbed...

------------------------------

Message: 3

Date: Mon, 14 Oct 2013 08:34:17 -0700

Subject: Re: [Scikit-learn-general] choice of regularization parameter

grid for elastic net

Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Thanks, Alex. That is helpful. Looks like the glmnet documentation says

that this is how they do it as well. What they don't explain is how to

find alpha_max in the first place. The only thing I've thought of is

doing something like a binary search until you find the smallest alpha

yielding the coef_ of zeros, with some limit on how many steps you do it

in. But is there a better way?

Also, how do you choose the smallest alpha value (or in other words, how

do you choose eps)? I came across an unofficial third-party description

of glmnet that said that if nobs < nvars, a higher value is chosen

(0.01, I think), whereas if nobs > nvars, a smaller value is chosen

(say, 0.0001). The basic idea makes sense, but it seems a bit ad hoc to

me, and it seems like it would be sensible to have more than two

possible values, based on the ratio of nobs to nvars. Any thoughts?

hi James,

for a given value of l1_ratio, the grid of alphas is chosen in log scale

starting from alpha_max to alpha_max / 10**eps. Any value of alpha

larger than alpha_max will lead to a coef_ full of zeros.

HTH

Alex

------------------------------

Message: 4

Date: Mon, 14 Oct 2013 17:40:33 +0200

Subject: Re: [Scikit-learn-general] centering of sparse data for

elastic net

<

Content-Type: text/plain; charset=UTF-8

I've noticed people use the term "normalize" in different ways. In the

case

of the `normalize=True` flag of the linear models, does it mean both

scaling

samples to have unit norm and centering them to have mean zero? If so,

this

is inconsistent with the usage in, say, the preprocessing module, where

"normalization" refers only to scaling to unit norm, and the word

"standardization" is used to refer to doing both (although the function

to

standardize is scale(), and "scale" seems more naturally associated with

normalization, in my mind). Because of this, I had supposed that the

`normalize=True` flag did not determine centering.

Yes, this is inconsistent with the preprocessing module. "normalize"

in linear_models is what preprocessing calls "standard scaling".

------------------------------

Message: 5

Date: Mon, 14 Oct 2013 10:00:17 -0600

Subject: Re: [Scikit-learn-general] choice of regularization parameter

grid for elastic net

<CADJSnkytfDL-Ziy5q9FbKnuhG2GX=0P9wo3zm4=MZ9-Egrx3=

Content-Type: text/plain; charset="utf-8"

If by 'alpha' you mean what the lasso literature refers to as 'lambda', my

recollection is that the maximum lambda is determined simply by the L1 norm

of the coefficients of the ordinary least squares solution, because any

value greater than that provides no constraint for the lasso solution.

http://techtalks.tv/talks/the-lasso-persistence-and-cross-validation/58279/

Regards,

Nick

Thanks, Alex. That is helpful. Looks like the glmnet documentation says

that this is how they do it as well. What they don't explain is how to

find alpha_max in the first place. The only thing I've thought of is

doing something like a binary search until you find the smallest alpha

yielding the coef_ of zeros, with some limit on how many steps you do it

in. But is there a better way?

Also, how do you choose the smallest alpha value (or in other words, how

do you choose eps)? I came across an unofficial third-party description

of glmnet that said that if nobs < nvars, a higher value is chosen

(0.01, I think), whereas if nobs > nvars, a smaller value is chosen

(say, 0.0001). The basic idea makes sense, but it seems a bit ad hoc to

me, and it seems like it would be sensible to have more than two

possible values, based on the ratio of nobs to nvars. Any thoughts?

hi James,

for a given value of l1_ratio, the grid of alphas is chosen in log

scale

starting from alpha_max to alpha_max / 10**eps. Any value of alpha

larger than alpha_max will lead to a coef_ full of zeros.

HTH

Alex

------------------------------------------------------------------------------

October Webinars: Code for Performance

Free Intel webinars can help you accelerate application performance.

Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most

from

the latest Intel processors and coprocessors. See abstracts and register

http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk

_______________________________________________

Scikit-learn-general mailing list

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

-------------- next part --------------

An HTML attachment was scrubbed...

------------------------------

Message: 6

Date: Mon, 14 Oct 2013 22:02:59 +0530

Subject: [Scikit-learn-general] Contributing to scikit-learn

<

Content-Type: text/plain; charset="iso-8859-1"

Hi,

I am Ankit Agrawal, a 4th year undergrad majoring in EE with

specialization in Communications and Signal Processing at IIT Bombay. I

completed my GSoC with scikit-image this year and have a good grasp with

Python(and a little bit with Cython). I have completed a course in ML, and

have taken some courses where it is applied, namely Computer Vision, NLP

and Speech Processing.

I would like to contribute to scikit-learn to improve my understanding

of different ML algorithms. I have started going through some parts of the

documentation and also through the Contributing page. If there are any

other pointers to go through to get started, please let me know. Thanks.

Regards,

Ankit Agrawal,

Communication and Signal Processing,

IIT Bombay.

-------------- next part --------------

An HTML attachment was scrubbed...

------------------------------

------------------------------------------------------------------------------

October Webinars: Code for Performance

Free Intel webinars can help you accelerate application performance.

Explore tips for MPI, OpenMP, advanced profiling, and more. Get the most

from

the latest Intel processors and coprocessors. See abstracts and register >

http://pubads.g.doubleclick.net/gampad/clk?id=60134071&iu=/4140/ostg.clktrk

------------------------------

_______________________________________________

Scikit-learn-general mailing list

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

End of Scikit-learn-general Digest, Vol 45, Issue 16

****************************************************