Sean Violante

2014-06-30 06:26:39 UTC

Hi

Why doesn't PCA and Probabilistic PCA calculate the inverse transform

properly when whitening is enabled? AFAIK all that is required is to (in

addition) multiply by explained_variance?

sean

On Mon, Jun 30, 2014 at 5:28 AM, <

Why doesn't PCA and Probabilistic PCA calculate the inverse transform

properly when whitening is enabled? AFAIK all that is required is to (in

addition) multiply by explained_variance?

sean

On Mon, Jun 30, 2014 at 5:28 AM, <

Send Scikit-learn-general mailing list submissions to

To subscribe or unsubscribe via the World Wide Web, visit

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

or, via email, send a message with subject or body 'help' to

You can reach the person managing the list at

When replying, please edit your Subject line so it is more specific

than "Re: Contents of Scikit-learn-general digest..."

1. Difference between sklearn.feature_selection.chi2 and

scipy.stats.chi2_contingency (Christian Jauvin)

2. Re: Retrieve the coefficients of fitted polynomial using

LASSO (Fernando Paolo)

3. Retrieve the coefficients of fitted polynomial using LASSO

(Fernando Paolo)

----------------------------------------------------------------------

Message: 1

Date: Sun, 29 Jun 2014 18:28:07 -0400

Subject: [Scikit-learn-general] Difference between

sklearn.feature_selection.chi2 and scipy.stats.chi2_contingency

To: "scikit-learn mailing list (sklearn)"

<

Content-Type: text/plain; charset="utf-8"

Hi,

Suppose I wanted to test the independence of two boolean variables using

33))

(100, 2)

(array([ 0.5]), array([ 0.47950012]))

col_0 0 1

row_0

0 18 7

1 42 33

correction=False)

(2.0, 0.15729920705028505, 1, array([[ 15., 10.],

[ 45., 30.]]))

What explains the difference in terms of the Chi-Square value (0.5 vs 2)

and the P-value (0.48 vs 0.157)?

Thanks,

Christian

-------------- next part --------------

An HTML attachment was scrubbed...

------------------------------

Message: 2

Date: Sun, 29 Jun 2014 18:52:37 -0700

Subject: Re: [Scikit-learn-general] Retrieve the coefficients of

fitted polynomial using LASSO

<CAPBk00E+nTDYKUm0T0bWQX=

Content-Type: text/plain; charset="utf-8"

Michael and Mathieu, thanks for your answers!

Perhaps I should explain better my problem, so you may have a better

suggestion on how to approach it. I have several datasets of the form f =

y(x), and I need to fit to these data a 'linear', 'quadratic' or 'cubic'

polynomial. So I want to (i) *automatically* determine the rank of the

problem (constrained to n = 1, 2 or 3), (ii) fit the respective polynomial

of order n, and (iii) retrieve the coefficients a_i of the fitted

polynomial, such that

p(x) = a_0 + a_1 * x + a_2 * x^2 + a_3 * x^3

with x being the input data.

Note: If a simpler model explains the data "reasonably well" (i.e. not

necessarily presenting the best MSE), then is always preferred. That is, a

line is preferred over a parabola, and so on. That's why I initially

thought of using LASSO. So, is it possible to retrieve the coefficients a_i

(above) from the LASSO model? If not, how can I achieve this using the

sklearn library?

Of course a "brute force" approach would be to first determine the rank of

the problem using LASSO, and then fit the respective polynomial using

least-squares.

Thank you,

-fernando

http://scikit-learn.org/dev/modules/generated/sklearn.preprocessing.PolynomialFeatures.html

"lasso_model" and "lasso_fit" refer to the same thing.

lasso_predict = lasso_model.predict(Xpoly[:,1:n+1])

while

makes no

actually

coefficients

least-squares fit so the model is different from a lasso.

You should use LinearRegression or Ridge with light regularization

instead.

Fernando Paolo

Institute of Geophysics & Planetary Physics

Scripps Institution of Oceanography

University of California, San Diego

9500 Gilman Drive

La Jolla, CA 92093-0225

-------------- next part --------------

An HTML attachment was scrubbed...

------------------------------

Message: 3

Date: Sun, 29 Jun 2014 20:28:08 -0700

Subject: [Scikit-learn-general] Retrieve the coefficients of fitted

polynomial using LASSO

<

Content-Type: text/plain; charset="utf-8"

Note2: In summary, I want the coefficients a_i without having to pre-define

neither the degree of the polynomial to fit (n) nor the amount of

regularization to apply (alpha), and always preferring the simpler model

(less coefficients).

-fernando

"lasso_model" and "lasso_fit" refer to the same thing.

lasso_predict = lasso_model.predict(Xpoly[:,1:n+1])

'p_lasso'

what

least-squares fit so the model is different from a lasso.

You should use LinearRegression or Ridge with light regularization

instead.

HTH,

Mathieu

------------------------------------------------------------------------------

Fernando Paolo

Institute of Geophysics & Planetary Physics

Scripps Institution of Oceanography

University of California, San Diego

9500 Gilman Drive

La Jolla, CA 92093-0225

--

Fernando Paolo

Institute of Geophysics & Planetary Physics

Scripps Institution of Oceanography

University of California, San Diego

web: fspaolo.net

--

Fernando Paolo

Institute of Geophysics & Planetary Physics

Scripps Institution of Oceanography

University of California, San Diego

9500 Gilman Drive

La Jolla, CA 92093-0225

-------------- next part --------------

An HTML attachment was scrubbed...

------------------------------

------------------------------------------------------------------------------

Open source business process management suite built on Java and Eclipse

Turn processes into business applications with Bonita BPM Community Edition

Quickly connect people, data, and systems into organized workflows

Winner of BOSSIE, CODIE, OW2 and Gartner awards

http://p.sf.net/sfu/Bonitasoft

------------------------------

_______________________________________________

Scikit-learn-general mailing list

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

End of Scikit-learn-general Digest, Vol 53, Issue 51

****************************************************

To subscribe or unsubscribe via the World Wide Web, visit

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

or, via email, send a message with subject or body 'help' to

You can reach the person managing the list at

When replying, please edit your Subject line so it is more specific

than "Re: Contents of Scikit-learn-general digest..."

1. Difference between sklearn.feature_selection.chi2 and

scipy.stats.chi2_contingency (Christian Jauvin)

2. Re: Retrieve the coefficients of fitted polynomial using

LASSO (Fernando Paolo)

3. Retrieve the coefficients of fitted polynomial using LASSO

(Fernando Paolo)

----------------------------------------------------------------------

Message: 1

Date: Sun, 29 Jun 2014 18:28:07 -0400

Subject: [Scikit-learn-general] Difference between

sklearn.feature_selection.chi2 and scipy.stats.chi2_contingency

To: "scikit-learn mailing list (sklearn)"

<

Content-Type: text/plain; charset="utf-8"

Hi,

Suppose I wanted to test the independence of two boolean variables using

X = numpy.vstack(([[0,0]] * 18, [[0,1]] * 7, [[1,0]] * 42, [[1,1]] *

X.shape

sklearn.feature_selection.chi2(X[:,[0]], X[:,1])

pandas.crosstab(X[:,0], X[:,1])

row_0

0 18 7

1 42 33

scipy.stats.chi2_contingency(pd.crosstab(X[:,0], X[:,1]),

(2.0, 0.15729920705028505, 1, array([[ 15., 10.],

[ 45., 30.]]))

What explains the difference in terms of the Chi-Square value (0.5 vs 2)

and the P-value (0.48 vs 0.157)?

Thanks,

Christian

-------------- next part --------------

An HTML attachment was scrubbed...

------------------------------

Message: 2

Date: Sun, 29 Jun 2014 18:52:37 -0700

Subject: Re: [Scikit-learn-general] Retrieve the coefficients of

fitted polynomial using LASSO

<CAPBk00E+nTDYKUm0T0bWQX=

Content-Type: text/plain; charset="utf-8"

Michael and Mathieu, thanks for your answers!

Perhaps I should explain better my problem, so you may have a better

suggestion on how to approach it. I have several datasets of the form f =

y(x), and I need to fit to these data a 'linear', 'quadratic' or 'cubic'

polynomial. So I want to (i) *automatically* determine the rank of the

problem (constrained to n = 1, 2 or 3), (ii) fit the respective polynomial

of order n, and (iii) retrieve the coefficients a_i of the fitted

polynomial, such that

p(x) = a_0 + a_1 * x + a_2 * x^2 + a_3 * x^3

with x being the input data.

Note: If a simpler model explains the data "reasonably well" (i.e. not

necessarily presenting the best MSE), then is always preferred. That is, a

line is preferred over a parabola, and so on. That's why I initially

thought of using LASSO. So, is it possible to retrieve the coefficients a_i

(above) from the LASSO model? If not, how can I achieve this using the

sklearn library?

Of course a "brute force" approach would be to first determine the rank of

the problem using LASSO, and then fit the respective polynomial using

least-squares.

Thank you,

-fernando

Hi Fernando,

0.1,Hello,

I must be missing something obvious because I can't find the "actual"

coefficients of the polynomial fitted using LassoCV. That is, for a 3rd

degree polynomial

p = a0 + a1 * x + a2 * x^2 + a3 * x^3

I want the a0, a1, a2 and a3 coefficients (as those returned by

numpy.polyfit()). Here is an example code of what I'm after

import numpy as np

import matplotlib.pyplot as plt

from pandas import *

from math import *

from patsy import dmatrix

from sklearn.linear_model import LassoCV

sin_data = DataFrame({'x' : np.linspace(0, 1, 101)})

sin_data['y'] = np.sin(2 * pi * sin_data['x']) + np.random.normal(0,

I must be missing something obvious because I can't find the "actual"

coefficients of the polynomial fitted using LassoCV. That is, for a 3rd

degree polynomial

p = a0 + a1 * x + a2 * x^2 + a3 * x^3

I want the a0, a1, a2 and a3 coefficients (as those returned by

numpy.polyfit()). Here is an example code of what I'm after

import numpy as np

import matplotlib.pyplot as plt

from pandas import *

from math import *

from patsy import dmatrix

from sklearn.linear_model import LassoCV

sin_data = DataFrame({'x' : np.linspace(0, 1, 101)})

sin_data['y'] = np.sin(2 * pi * sin_data['x']) + np.random.normal(0,

101)

x = sin_data['x']

y = sin_data['y']

Xpoly = dmatrix('C(x, Poly)')

The development version of scikit-learn contains a transformer to dox = sin_data['x']

y = sin_data['y']

Xpoly = dmatrix('C(x, Poly)')

n = 3

lasso_model = LassoCV(cv=15, copy_X=True, normalize=True)

lasso_fit = lasso_model.fit(Xpoly[:,1:n+1], y)

In scikit-learn, "fit" always returns the model itself so herelasso_model = LassoCV(cv=15, copy_X=True, normalize=True)

lasso_fit = lasso_model.fit(Xpoly[:,1:n+1], y)

"lasso_model" and "lasso_fit" refer to the same thing.

lasso_predict = lasso_model.predict(Xpoly[:,1:n+1])

a = np.r_[lasso_fit.intercept_, lasso_fit.coef_]

b = np.polyfit(x, y, n)[::-1]

p_lasso = a[0] + a[1] * x + a[2] * x**2 + a[3] * x**3

p_polyfit = b[0] + b[1] * x + b[2] * x**2 + b[3] * x**3

print 'coef. lasso:', a

print 'coef. polyfit:', b

The returned coefficients 'a' and 'b' are completely different, and

b = np.polyfit(x, y, n)[::-1]

p_lasso = a[0] + a[1] * x + a[2] * x**2 + a[3] * x**3

p_polyfit = b[0] + b[1] * x + b[2] * x**2 + b[3] * x**3

print 'coef. lasso:', a

print 'coef. polyfit:', b

The returned coefficients 'a' and 'b' are completely different, and

'p_polyfit' is indeed the fitted polynomial of degree 3, 'p_lasso'

sense (plot to see). Unless 'b' is something else... If so, what

are the coefficients returned by fit()? And how can I get the

that reconstruct the fitted polynomial?

Why are you expecting a and b to be the same? np.polyfit returns aleast-squares fit so the model is different from a lasso.

You should use LinearRegression or Ridge with light regularization

HTH,

Mathieu

------------------------------------------------------------------------------Mathieu

Open source business process management suite built on Java and Eclipse

Turn processes into business applications with Bonita BPM Community

EditionTurn processes into business applications with Bonita BPM Community

Quickly connect people, data, and systems into organized workflows

Winner of BOSSIE, CODIE, OW2 and Gartner awards

http://p.sf.net/sfu/Bonitasoft

_______________________________________________

Scikit-learn-general mailing list

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--Winner of BOSSIE, CODIE, OW2 and Gartner awards

http://p.sf.net/sfu/Bonitasoft

_______________________________________________

Scikit-learn-general mailing list

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Fernando Paolo

Institute of Geophysics & Planetary Physics

Scripps Institution of Oceanography

University of California, San Diego

9500 Gilman Drive

La Jolla, CA 92093-0225

-------------- next part --------------

An HTML attachment was scrubbed...

------------------------------

Message: 3

Date: Sun, 29 Jun 2014 20:28:08 -0700

Subject: [Scikit-learn-general] Retrieve the coefficients of fitted

polynomial using LASSO

<

Content-Type: text/plain; charset="utf-8"

Note2: In summary, I want the coefficients a_i without having to pre-define

neither the degree of the polynomial to fit (n) nor the amount of

regularization to apply (alpha), and always preferring the simpler model

(less coefficients).

-fernando

Michael and Mathieu, thanks for your answers!

Perhaps I should explain better my problem, so you may have a better

suggestion on how to approach it. I have several datasets of the form f =

y(x), and I need to fit to these data a 'linear', 'quadratic' or 'cubic'

polynomial. So I want to (i) *automatically* determine the rank of the

problem (constrained to n = 1, 2 or 3), (ii) fit the respective

polynomialPerhaps I should explain better my problem, so you may have a better

suggestion on how to approach it. I have several datasets of the form f =

y(x), and I need to fit to these data a 'linear', 'quadratic' or 'cubic'

polynomial. So I want to (i) *automatically* determine the rank of the

problem (constrained to n = 1, 2 or 3), (ii) fit the respective

of order n, and (iii) retrieve the coefficients a_i of the fitted

polynomial, such that

p(x) = a_0 + a_1 * x + a_2 * x^2 + a_3 * x^3

with x being the input data.

Note: If a simpler model explains the data "reasonably well" (i.e. not

necessarily presenting the best MSE), then is always preferred. That is,

apolynomial, such that

p(x) = a_0 + a_1 * x + a_2 * x^2 + a_3 * x^3

with x being the input data.

Note: If a simpler model explains the data "reasonably well" (i.e. not

necessarily presenting the best MSE), then is always preferred. That is,

line is preferred over a parabola, and so on. That's why I initially

thought of using LASSO. So, is it possible to retrieve the coefficients

a_ithought of using LASSO. So, is it possible to retrieve the coefficients

(above) from the LASSO model? If not, how can I achieve this using the

sklearn library?

Of course a "brute force" approach would be to first determine the rank

ofsklearn library?

Of course a "brute force" approach would be to first determine the rank

the problem using LASSO, and then fit the respective polynomial using

least-squares.

Thank you,

-fernando

http://scikit-learn.org/dev/modules/generated/sklearn.preprocessing.PolynomialFeatures.htmlleast-squares.

Thank you,

-fernando

Hi Fernando,

Hello,

I must be missing something obvious because I can't find the "actual"

coefficients of the polynomial fitted using LassoCV. That is, for a 3rd

degree polynomial

p = a0 + a1 * x + a2 * x^2 + a3 * x^3

I want the a0, a1, a2 and a3 coefficients (as those returned by

numpy.polyfit()). Here is an example code of what I'm after

import numpy as np

import matplotlib.pyplot as plt

from pandas import *

from math import *

from patsy import dmatrix

from sklearn.linear_model import LassoCV

sin_data = DataFrame({'x' : np.linspace(0, 1, 101)})

sin_data['y'] = np.sin(2 * pi * sin_data['x']) + np.random.normal(0,

0.1, 101)

x = sin_data['x']

y = sin_data['y']

Xpoly = dmatrix('C(x, Poly)')

The development version of scikit-learn contains a transformer to doI must be missing something obvious because I can't find the "actual"

coefficients of the polynomial fitted using LassoCV. That is, for a 3rd

degree polynomial

p = a0 + a1 * x + a2 * x^2 + a3 * x^3

I want the a0, a1, a2 and a3 coefficients (as those returned by

numpy.polyfit()). Here is an example code of what I'm after

import numpy as np

import matplotlib.pyplot as plt

from pandas import *

from math import *

from patsy import dmatrix

from sklearn.linear_model import LassoCV

sin_data = DataFrame({'x' : np.linspace(0, 1, 101)})

sin_data['y'] = np.sin(2 * pi * sin_data['x']) + np.random.normal(0,

0.1, 101)

x = sin_data['x']

y = sin_data['y']

Xpoly = dmatrix('C(x, Poly)')

n = 3

lasso_model = LassoCV(cv=15, copy_X=True, normalize=True)

lasso_fit = lasso_model.fit(Xpoly[:,1:n+1], y)

In scikit-learn, "fit" always returns the model itself so herelasso_model = LassoCV(cv=15, copy_X=True, normalize=True)

lasso_fit = lasso_model.fit(Xpoly[:,1:n+1], y)

"lasso_model" and "lasso_fit" refer to the same thing.

lasso_predict = lasso_model.predict(Xpoly[:,1:n+1])

a = np.r_[lasso_fit.intercept_, lasso_fit.coef_]

b = np.polyfit(x, y, n)[::-1]

p_lasso = a[0] + a[1] * x + a[2] * x**2 + a[3] * x**3

p_polyfit = b[0] + b[1] * x + b[2] * x**2 + b[3] * x**3

print 'coef. lasso:', a

print 'coef. polyfit:', b

The returned coefficients 'a' and 'b' are completely different, and

while 'p_polyfit' is indeed the fitted polynomial of degree 3,

b = np.polyfit(x, y, n)[::-1]

p_lasso = a[0] + a[1] * x + a[2] * x**2 + a[3] * x**3

p_polyfit = b[0] + b[1] * x + b[2] * x**2 + b[3] * x**3

print 'coef. lasso:', a

print 'coef. polyfit:', b

The returned coefficients 'a' and 'b' are completely different, and

while 'p_polyfit' is indeed the fitted polynomial of degree 3,

makes no sense (plot to see). Unless 'b' is something else... If so,

actually are the coefficients returned by fit()? And how can I get the

coefficients that reconstruct the fitted polynomial?

Why are you expecting a and b to be the same? np.polyfit returns acoefficients that reconstruct the fitted polynomial?

least-squares fit so the model is different from a lasso.

You should use LinearRegression or Ridge with light regularization

instead.

HTH,

Mathieu

Open source business process management suite built on Java and Eclipse

Turn processes into business applications with Bonita BPM Community

Edition

Quickly connect people, data, and systems into organized workflows

Winner of BOSSIE, CODIE, OW2 and Gartner awards

http://p.sf.net/sfu/Bonitasoft

_______________________________________________

Scikit-learn-general mailing list

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--Turn processes into business applications with Bonita BPM Community

Edition

Quickly connect people, data, and systems into organized workflows

Winner of BOSSIE, CODIE, OW2 and Gartner awards

http://p.sf.net/sfu/Bonitasoft

_______________________________________________

Scikit-learn-general mailing list

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Fernando Paolo

Institute of Geophysics & Planetary Physics

Scripps Institution of Oceanography

University of California, San Diego

9500 Gilman Drive

La Jolla, CA 92093-0225

Fernando Paolo

Institute of Geophysics & Planetary Physics

Scripps Institution of Oceanography

University of California, San Diego

web: fspaolo.net

--

Fernando Paolo

Institute of Geophysics & Planetary Physics

Scripps Institution of Oceanography

University of California, San Diego

9500 Gilman Drive

La Jolla, CA 92093-0225

-------------- next part --------------

An HTML attachment was scrubbed...

------------------------------

------------------------------------------------------------------------------

Open source business process management suite built on Java and Eclipse

Turn processes into business applications with Bonita BPM Community Edition

Quickly connect people, data, and systems into organized workflows

Winner of BOSSIE, CODIE, OW2 and Gartner awards

http://p.sf.net/sfu/Bonitasoft

------------------------------

_______________________________________________

Scikit-learn-general mailing list

https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

End of Scikit-learn-general Digest, Vol 53, Issue 51

****************************************************