Discussion:
Effects of shifting and scaling on Gradient Descent
(too old to reply)
Shishir Pandey
2013-04-24 21:41:09 UTC
Permalink
Hi

I want to understand the effects shifting and scaling of data have on
the rate of convergence of gradient descent and the surface of the cost
function. Can you give me some pointers? Some thing on the lines of this
lecture http://bit.ly/17RMzTK

I also think it will be great to have this example on the website.
--
sp
Ronnie Ghose
2013-04-24 22:39:39 UTC
Permalink
.... gradient descent in an optimization problem ex. newton or in neural
nets or ....?
Post by Shishir Pandey
Hi
I want to understand the effects shifting and scaling of data have on
the rate of convergence of gradient descent and the surface of the cost
function. Can you give me some pointers? Some thing on the lines of this
lecture http://bit.ly/17RMzTK
I also think it will be great to have this example on the website.
--
sp
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Shishir Pandey
2013-04-25 06:15:56 UTC
Permalink
Suppose we use gradient descent for a regression problem. How are rate
of convergence and cost surface affected when we scale and shift the input?
Effects of shifting and scaling on
Gradient Descent
Content-Type: text/plain; charset="iso-8859-1"
.... gradient descent in an optimization problem ex. newton or in neural
nets or ....?
Post by Shishir Pandey
Hi
I want to understand the effects shifting and scaling of data have on
the rate of convergence of gradient descent and the surface of the cost
function. Can you give me some pointers? Some thing on the lines of this
lecturehttp://bit.ly/17RMzTK
I also think it will be great to have this example on the website.
--
sp
--
sp
Jaques Grobler
2013-04-25 12:09:13 UTC
Permalink
Gael Varoquaux
2013-04-25 12:41:11 UTC
Permalink
Post by Shishir Pandey
I also think it will be great to have this example on the website.
Do you mean like an interactive example that works similiar to the SVM
Gui example , but for understand the effects shifting and scaling of
data has on the rate of convergence of gradient descent and the surface
of the cost function?
This is out of scope for the project: scikit-learn is a machine learning
toolkit. Gradient descent is a general class of optimization algorithms.

Gaël
Ronnie Ghose
2013-04-25 13:10:35 UTC
Permalink
I think he means what increases/benefits do you get from rescaling features
e.g. minmax or preprocessing.scale
Post by
Post by Shishir Pandey
I also think it will be great to have this example on the website.
Do you mean like an interactive example that works similiar to the SVM
Gui example , but for understand the effects shifting and scaling of
data has on the rate of convergence of gradient descent and the surface
of the cost function?
This is out of scope for the project: scikit-learn is a machine learning
toolkit. Gradient descent is a general class of optimization algorithms.

Gaël
Andreas Mueller
2013-04-27 14:39:43 UTC
Permalink
I don't think you can make any statements about the optimization method
wrt the data
when you don't specify the loss function you want to minimize.
Post by Ronnie Ghose
I think he means what increases/benefits do you get from rescaling
features e.g. minmax or preprocessing.scale
Post by
Post by Shishir Pandey
I also think it will be great to have this example on the website.
Do you mean like an interactive example that works similiar to the SVM
Gui example , but for understand the effects shifting and scaling of
data has on the rate of convergence of gradient descent and the surface
of the cost function?
This is out of scope for the project: scikit-learn is a machine learning
toolkit. Gradient descent is a general class of optimization algorithms.
Gaël
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring
service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
1970-01-01 00:00:00 UTC
Permalink
--90e6ba30949a6ea1b104db2e49f8
Content-Type: text/plain; charset=ISO-8859-1
Post by Shishir Pandey
I also think it will be great to have this example on the website.
Do you mean like an interactive example that works similiar to the SVM Gui
example<http://scikit-learn.org/dev/auto_examples/applications/svm_gui.html#example-applications-svm-gui-py>
,
but for
understand the effects shifting and scaling of data has on
the rate of convergence of gradient descent and the surface of the cost
function?
Post by Shishir Pandey
Suppose we use gradient descent for a regression problem. How are rate
of convergence and cost surface affected when we scale and shift the input?
Effects of shifting and scaling on
Gradient Descent
<
Content-Type: text/plain; charset="iso-8859-1"
.... gradient descent in an optimization problem ex. newton or in neural
nets or ....?
Post by Shishir Pandey
Hi
I want to understand the effects shifting and scaling of data have on
the rate of convergence of gradient descent and the surface of the cost
function. Can you give me some pointers? Some thing on the lines of
this
Post by Shishir Pandey
lecturehttp://bit.ly/17RMzTK
I also think it will be great to have this example on the website.
--
sp
--
sp
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--90e6ba30949a6ea1b104db2e49f8
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable <div dir="ltr">&gt; I also think it will be great to have this example on the website.<br><div><br></div><div style>Do you mean like an interactive example that works similiar to the�<a href="http://scikit-learn.org/dev/auto_examples/applications/svm_gui.html#example-applications-svm-gui-py">SVM Gui example</a>�, but for</div> <div style><div>understand the effects shifting and scaling of data has on</div><div>the rate of convergence of gradient descent and the surface of the cost</div><div>function?</div></div></div><div class="gmail_extra"><br> <br><div class="gmail_quote">2013/4/25 Shishir Pandey <span dir="ltr">&lt;<a href="mailto:***@gmail.com" target="_blank">***@gmail.com</a>&gt;</span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Suppose we use gradient descent for a regression problem. How are rate<br>
of convergence and cost surface affected when we scale and shift the input?<br>
<br>
On 25-04-2013 06:26, <a href="mailto:scikit-learn-general-***@lists.sourceforge.net">scikit-learn-general-***@lists.sourceforge.net</a><br>
wrote:<br>
&gt;   Effects of shifting and scaling on<br>
&gt;       Gradient Descent<br>
&gt; <a href="mailto:To%3Ascikit-learn-***@lists.sourceforge.net">To:scikit-learn-***@lists.sourceforge.net</a><br>
&gt; Message-ID:<br>
&gt;       &lt;<a href="mailto:***@mail.gmail.com">***@mail.gmail.com</a>&gt;<br>
&gt; Content-Type: text/plain; charset=&quot;iso-8859-1&quot;<br> <div class="im">&gt;<br>
&gt; .... gradient descent in an optimization problem ex. newton or in neural<br>
&gt; nets or ....?<br>
&gt;<br>
&gt;<br>
&gt; On Wed, Apr 24, 2013 at 5:41 PM, Shishir Pandey&lt;<a href="mailto:***@gmail.com">***@gmail.com</a>&gt;wrote:<br>
&gt;<br>
&gt;&gt; &gt;Hi<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;I want to understand the effects shifting and scaling of data have on<br>
&gt;&gt; &gt;the rate of convergence of gradient descent and the surface of the cost<br>
&gt;&gt; &gt;function. Can you give me some pointers? Some thing on the lines of this<br> </div>&gt;&gt; &gt;lecturehttp://<a href="http://bit.ly/17RMzTK" target="_blank">bit.ly/17RMzTK</a><br> <div class="HOEnZb"><div class="h5">&gt;&gt; &gt;<br>
&gt;&gt; &gt;I also think it will be great to have this example on the website.<br>
&gt;&gt; &gt;<br>
&gt;&gt; &gt;--<br>
&gt;&gt; &gt;sp<br>
&gt;&gt; &gt;<br>
<br>
--<br>
sp<br>
<br>
<br>
------------------------------------------------------------------------------<br>
Try New Relic Now &amp; We&#39;ll Send You this Cool Shirt<br>
New Relic is the only SaaS-based application performance monitoring service<br>
that delivers powerful full stack analytics. Optimize and monitor your<br>
browser, app, &amp; servers with just a few lines of code. Try New Relic<br>
and get this awesome Nerd Life shirt! <a href="http://p.sf.net/sfu/newrelic_d2d_apr" target="_blank">http://p.sf.net/sfu/newrelic_d2d_apr</a><br>
_______________________________________________<br>
Scikit-learn-general mailing list<br>
<a href="mailto:Scikit-learn-***@lists.sourceforge.net">Scikit-learn-***@lists.sourceforge.net</a><br>
<a href="https://lists.sourceforge.net/lists/listinfo/scikit-learn-general" target="_blank">https://lists.sourceforge.net/lists/listinfo/scikit-learn-general</a><br>
</div></div></blockquote></div><br></div>

--90e6ba30949a6ea1b104db2e49f8--
Shishir Pandey
2013-04-25 18:08:21 UTC
Permalink
Thanks Ronnie for pointing out the exact method in the scikit-learn
library. Yes, that is exactly what I was asking how does the rescaling
of features affect the gradient descent algorithm. Since, stochastic
gradient descent is an algorithm which is used in machine learning quite
a lot. It will be good to understand how its performance is affected
after rescaling features.

Jaques, I am having some trouble running the example. But yes it will be
good if we can have gui example.
Date: Thu, 25 Apr 2013 09:10:35 -0400
Subject: Re: [Scikit-learn-general] Effects of shifting and scaling on
Gradient Descent
Content-Type: text/plain; charset="iso-8859-1"
I think he means what increases/benefits do you get from rescaling features
e.g. minmax or preprocessing.scale
Post by
Post by Shishir Pandey
I also think it will be great to have this example on the website.
Do you mean like an interactive example that works similiar to the SVM
Gui example , but for understand the effects shifting and scaling of
data has on the rate of convergence of gradient descent and the surface
of the cost function?
This is out of scope for the project: scikit-learn is a machine learning
toolkit. Gradient descent is a general class of optimization algorithms.
Ga?l
--
sp
Matthieu Brucher
2013-04-25 18:15:59 UTC
Permalink
Hi,

Do you mean scaling the parameters of the cost function? If so, scaling
will change the surface of the cost function, of course. It's kind of
complicated to say anything about how the surface will behave, it
completely depends of the cost function you are using. A cost function that
is linear will have the same scale applied to the surface, but anything
fancier will behave differently (squared sum, robust cost...)
This also means that the gradient descent will be different ans may
converge to a different location.
As Gaël said, this is a generic optimization-related question, it is not
machine-learning related.

Matthieu
Post by Shishir Pandey
Thanks Ronnie for pointing out the exact method in the scikit-learn
library. Yes, that is exactly what I was asking how does the rescaling
of features affect the gradient descent algorithm. Since, stochastic
gradient descent is an algorithm which is used in machine learning quite
a lot. It will be good to understand how its performance is affected
after rescaling features.
Jaques, I am having some trouble running the example. But yes it will be
good if we can have gui example.
Date: Thu, 25 Apr 2013 09:10:35 -0400
Subject: Re: [Scikit-learn-general] Effects of shifting and scaling on
Gradient Descent
<CAHazPTmZX1dmMT1Mm_hTQjyyB8aV5C=
Content-Type: text/plain; charset="iso-8859-1"
I think he means what increases/benefits do you get from rescaling
features
e.g. minmax or preprocessing.scale
Post by
Post by Shishir Pandey
I also think it will be great to have this example on the website.
Do you mean like an interactive example that works similiar to the SVM
Gui example , but for understand the effects shifting and scaling of
data has on the rate of convergence of gradient descent and the surface
of the cost function?
This is out of scope for the project: scikit-learn is a machine learning
toolkit. Gradient descent is a general class of optimization algorithms.
Ga?l
--
sp
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Information System Engineer, Ph.D.
Blog: http://matt.eifelle.com
LinkedIn: http://www.linkedin.com/in/matthieubrucher
Music band: http://liliejay.com/
Shishir Pandey
2013-04-25 21:07:27 UTC
Permalink
I did not mean parameters of the cost function. I only want to scale the
input variables. Suppose one of the independent variables has a range
from 10 - 1000 and some other has a range in 0.1 - 1. Then Andrew Ng and
others say in their machine learning lectures that one should rescale
the input data to bring all variables to similar range
(http://openclassroom.stanford.edu/MainFolder/VideoPage.php?course=MachineLearning&video=03.1-LinearRegressionII-FeatureScaling&speed=100)
. This will affect how the gradient descent will behave.

We can choose cost function right now to be the squared loss function.
Date: Thu, 25 Apr 2013 19:15:59 +0100 From: Matthieu Brucher
Content-Type: text/plain; charset="iso-8859-1" Hi, Do you mean scaling
the parameters of the cost function? If so, scaling will change the
surface of the cost function, of course. It's kind of complicated to
say anything about how the surface will behave, it completely depends
of the cost function you are using. A cost function that is linear
will have the same scale applied to the surface, but anything fancier
will behave differently (squared sum, robust cost...) This also means
that the gradient descent will be different ans may converge to a
different location. As Ga?l said, this is a generic
optimization-related question, it is not machine-learning related.
Post by Shishir Pandey
Thanks Ronnie for pointing out the exact method in the scikit-learn
library. Yes, that is exactly what I was asking how does the rescaling
of features affect the gradient descent algorithm. Since, stochastic
gradient descent is an algorithm which is used in machine learning quite
a lot. It will be good to understand how its performance is affected
after rescaling features.
Jaques, I am having some trouble running the example. But yes it will be
good if we can have gui example.
Date: Thu, 25 Apr 2013 09:10:35 -0400
Subject: Re: [Scikit-learn-general] Effects of shifting and scaling on
Gradient Descent
<CAHazPTmZX1dmMT1Mm_hTQjyyB8aV5C=
Content-Type: text/plain; charset="iso-8859-1"
I think he means what increases/benefits do you get from rescaling
features
e.g. minmax or preprocessing.scale
Post by
Post by Shishir Pandey
I also think it will be great to have this example on the website.
Do you mean like an interactive example that works similiar to the SVM
Gui example , but for understand the effects shifting and scaling of
data has on the rate of convergence of gradient descent and the surface
of the cost function?
This is out of scope for the project: scikit-learn is a machine learning
toolkit. Gradient descent is a general class of optimization algorithms.
Ga?l
--
sp
--
sp
Matthieu Brucher
2013-04-26 10:55:20 UTC
Permalink
Shishir Pandey
2013-04-26 06:57:51 UTC
Permalink
Even scikit-learn mentions on its stochastic gradient descent page:
http://scikit-learn.org/dev/modules/sgd.html#tips-on-practical-use
one should scale data. An example which shows what really happens to one
cost function (say squared loss) on scaling the data would be great.
Date: Fri, 26 Apr 2013 02:37:27 +0530
Subject: Re: [Scikit-learn-general] Effects of shifting and scaling on
Gradient Descent
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
I did not mean parameters of the cost function. I only want to scale the
input variables. Suppose one of the independent variables has a range
from 10 - 1000 and some other has a range in 0.1 - 1. Then Andrew Ng and
others say in their machine learning lectures that one should rescale
the input data to bring all variables to similar range
(http://openclassroom.stanford.edu/MainFolder/VideoPage.php?course=MachineLearning&video=03.1-LinearRegressionII-FeatureScaling&speed=100)
. This will affect how the gradient descent will behave.
We can choose cost function right now to be the squared loss function.
Date: Thu, 25 Apr 2013 19:15:59 +0100 From: Matthieu Brucher
Content-Type: text/plain; charset="iso-8859-1" Hi, Do you mean scaling
the parameters of the cost function? If so, scaling will change the
surface of the cost function, of course. It's kind of complicated to
say anything about how the surface will behave, it completely depends
of the cost function you are using. A cost function that is linear
will have the same scale applied to the surface, but anything fancier
will behave differently (squared sum, robust cost...) This also means
that the gradient descent will be different ans may converge to a
different location. As Ga?l said, this is a generic
optimization-related question, it is not machine-learning related.
Post by Shishir Pandey
Thanks Ronnie for pointing out the exact method in the scikit-learn
library. Yes, that is exactly what I was asking how does the rescaling
of features affect the gradient descent algorithm. Since, stochastic
gradient descent is an algorithm which is used in machine learning quite
a lot. It will be good to understand how its performance is affected
after rescaling features.
Jaques, I am having some trouble running the example. But yes it will be
good if we can have gui example.
Date: Thu, 25 Apr 2013 09:10:35 -0400
Subject: Re: [Scikit-learn-general] Effects of shifting and scaling on
Gradient Descent
<CAHazPTmZX1dmMT1Mm_hTQjyyB8aV5C=
Content-Type: text/plain; charset="iso-8859-1"
I think he means what increases/benefits do you get from rescaling
features
e.g. minmax or preprocessing.scale
Post by
Post by Shishir Pandey
I also think it will be great to have this example on the website.
Do you mean like an interactive example that works similiar to the SVM
Gui example , but for understand the effects shifting and scaling of
data has on the rate of convergence of gradient descent and the surface
of the cost function?
This is out of scope for the project: scikit-learn is a machine learning
toolkit. Gradient descent is a general class of optimization algorithms.
Ga?l
--
sp
-- sp
--
sp
Ronnie Ghose
2013-04-26 07:08:45 UTC
Permalink
afaik fits tend to work better and so do classifiers. it's much easier to
have a classifier try to fit between -1 and 1 then 0 and 10000 so it also
helps convergence.

http://stats.stackexchange.com/questions/41704/how-and-why-do-normalization-and-feature-scaling-work
and then

http://en.wikipedia.org/wiki/Feature_scaling
Post by Shishir Pandey
http://scikit-learn.org/dev/modules/sgd.html#tips-on-practical-use
one should scale data. An example which shows what really happens to one
cost function (say squared loss) on scaling the data would be great.
Date: Fri, 26 Apr 2013 02:37:27 +0530
Subject: Re: [Scikit-learn-general] Effects of shifting and scaling on
Gradient Descent
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
I did not mean parameters of the cost function. I only want to scale the
input variables. Suppose one of the independent variables has a range
from 10 - 1000 and some other has a range in 0.1 - 1. Then Andrew Ng and
others say in their machine learning lectures that one should rescale
the input data to bring all variables to similar range
(
http://openclassroom.stanford.edu/MainFolder/VideoPage.php?course=MachineLearning&video=03.1-LinearRegressionII-FeatureScaling&speed=100
)
. This will affect how the gradient descent will behave.
We can choose cost function right now to be the squared loss function.
Date: Thu, 25 Apr 2013 19:15:59 +0100 From: Matthieu Brucher
Content-Type: text/plain; charset="iso-8859-1" Hi, Do you mean scaling
the parameters of the cost function? If so, scaling will change the
surface of the cost function, of course. It's kind of complicated to
say anything about how the surface will behave, it completely depends
of the cost function you are using. A cost function that is linear
will have the same scale applied to the surface, but anything fancier
will behave differently (squared sum, robust cost...) This also means
that the gradient descent will be different ans may converge to a
different location. As Ga?l said, this is a generic
optimization-related question, it is not machine-learning related.
Post by Shishir Pandey
Thanks Ronnie for pointing out the exact method in the
scikit-learn
Post by Shishir Pandey
library. Yes, that is exactly what I was asking how does the
rescaling
Post by Shishir Pandey
of features affect the gradient descent algorithm. Since,
stochastic
Post by Shishir Pandey
gradient descent is an algorithm which is used in machine
learning quite
Post by Shishir Pandey
a lot. It will be good to understand how its performance is
affected
Post by Shishir Pandey
after rescaling features.
Jaques, I am having some trouble running the example. But yes it
will be
Post by Shishir Pandey
good if we can have gui example.
On 25-04-2013 19:12,
Date: Thu, 25 Apr 2013 09:10:35 -0400
Subject: Re: [Scikit-learn-general] Effects of shifting and
scaling on
Post by Shishir Pandey
Gradient Descent
<CAHazPTmZX1dmMT1Mm_hTQjyyB8aV5C=
Content-Type: text/plain; charset="iso-8859-1"
I think he means what increases/benefits do you get from
rescaling
Post by Shishir Pandey
features
e.g. minmax or preprocessing.scale
On Thu, Apr 25, 2013 at 02:09:13PM +0200, Jaques Grobler
Post by
Post by Shishir Pandey
I also think it will be great to have this
example on the website.
Post by Shishir Pandey
Post by
Do you mean like an interactive example that works
similiar to the SVM
Post by Shishir Pandey
Post by
Gui example , but for understand the effects
shifting and scaling of
Post by Shishir Pandey
Post by
data has on the rate of convergence of gradient
descent and the surface
Post by Shishir Pandey
Post by
of the cost function?
This is out of scope for the project: scikit-learn is a
machine learning
Post by Shishir Pandey
toolkit. Gradient descent is a general class of optimization
algorithms.
Post by Shishir Pandey
Ga?l
--
sp
-- sp
--
sp
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Peter Prettenhofer
2013-04-26 07:44:18 UTC
Permalink
(first-order) GD uses a single learning rate for all features - if features
have a different variability its difficult to find a one-size-fits-all
learning rate - the parameters of high variability features will tend
to oscillate whereas the parameters of low variability features will
converge too slowly.

There is a huge amount of literature on the topic - the Neural Network FAQ
[1] is a good (practical) starting point.

[1] ftp://ftp.sas.com/pub/neural/FAQ2.html#A_std
Post by Ronnie Ghose
afaik fits tend to work better and so do classifiers. it's much easier to
have a classifier try to fit between -1 and 1 then 0 and 10000 so it also
helps convergence.
http://stats.stackexchange.com/questions/41704/how-and-why-do-normalization-and-feature-scaling-work
and then
http://en.wikipedia.org/wiki/Feature_scaling
Post by Shishir Pandey
http://scikit-learn.org/dev/modules/sgd.html#tips-on-practical-use
one should scale data. An example which shows what really happens to one
cost function (say squared loss) on scaling the data would be great.
Date: Fri, 26 Apr 2013 02:37:27 +0530
Subject: Re: [Scikit-learn-general] Effects of shifting and scaling on
Gradient Descent
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
I did not mean parameters of the cost function. I only want to scale the
input variables. Suppose one of the independent variables has a range
from 10 - 1000 and some other has a range in 0.1 - 1. Then Andrew Ng and
others say in their machine learning lectures that one should rescale
the input data to bring all variables to similar range
(
http://openclassroom.stanford.edu/MainFolder/VideoPage.php?course=MachineLearning&video=03.1-LinearRegressionII-FeatureScaling&speed=100
)
. This will affect how the gradient descent will behave.
We can choose cost function right now to be the squared loss function.
Date: Thu, 25 Apr 2013 19:15:59 +0100 From: Matthieu Brucher
Content-Type: text/plain; charset="iso-8859-1" Hi, Do you mean scaling
the parameters of the cost function? If so, scaling will change the
surface of the cost function, of course. It's kind of complicated to
say anything about how the surface will behave, it completely depends
of the cost function you are using. A cost function that is linear
will have the same scale applied to the surface, but anything fancier
will behave differently (squared sum, robust cost...) This also means
that the gradient descent will be different ans may converge to a
different location. As Ga?l said, this is a generic
optimization-related question, it is not machine-learning related.
Post by Shishir Pandey
Thanks Ronnie for pointing out the exact method in the
scikit-learn
Post by Shishir Pandey
library. Yes, that is exactly what I was asking how does the
rescaling
Post by Shishir Pandey
of features affect the gradient descent algorithm. Since,
stochastic
Post by Shishir Pandey
gradient descent is an algorithm which is used in machine
learning quite
Post by Shishir Pandey
a lot. It will be good to understand how its performance is
affected
Post by Shishir Pandey
after rescaling features.
Jaques, I am having some trouble running the example. But yes it
will be
Post by Shishir Pandey
good if we can have gui example.
On 25-04-2013 19:12,
Date: Thu, 25 Apr 2013 09:10:35 -0400
Subject: Re: [Scikit-learn-general] Effects of shifting and
scaling on
Post by Shishir Pandey
Gradient Descent
<CAHazPTmZX1dmMT1Mm_hTQjyyB8aV5C=
Content-Type: text/plain; charset="iso-8859-1"
I think he means what increases/benefits do you get from
rescaling
Post by Shishir Pandey
features
e.g. minmax or preprocessing.scale
On Thu, Apr 25, 2013 at 02:09:13PM +0200, Jaques Grobler
Post by
Post by Shishir Pandey
I also think it will be great to have
this example on the website.
Post by Shishir Pandey
Post by
Do you mean like an interactive example that works
similiar to the SVM
Post by Shishir Pandey
Post by
Gui example , but for understand the effects
shifting and scaling of
Post by Shishir Pandey
Post by
data has on the rate of convergence of gradient
descent and the surface
Post by Shishir Pandey
Post by
of the cost function?
This is out of scope for the project: scikit-learn is a
machine learning
Post by Shishir Pandey
toolkit. Gradient descent is a general class of
optimization algorithms.
Post by Shishir Pandey
Ga?l
--
sp
-- sp
--
sp
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt!
http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
Jaques Grobler
2013-04-26 10:38:00 UTC
Permalink
@Shishir Pandey on a slight tangent, what problems are you having with
running Libsvm GUI?

I wonder if a GUI interactive example would really be necessary - we could
just have an example
illustrating the difference with plots when data is not scaled or scaled..
if people find that useful.
But the GUI example could be nice too :)
Post by Peter Prettenhofer
(first-order) GD uses a single learning rate for all features - if
features have a different variability its difficult to find a
one-size-fits-all learning rate - the parameters of high variability
features will tend to oscillate whereas the parameters of low variability
features will converge too slowly.
There is a huge amount of literature on the topic - the Neural Network FAQ
[1] is a good (practical) starting point.
[1] ftp://ftp.sas.com/pub/neural/FAQ2.html#A_std
Post by Ronnie Ghose
afaik fits tend to work better and so do classifiers. it's much easier to
have a classifier try to fit between -1 and 1 then 0 and 10000 so it also
helps convergence.
http://stats.stackexchange.com/questions/41704/how-and-why-do-normalization-and-feature-scaling-work
and then
http://en.wikipedia.org/wiki/Feature_scaling
Post by Shishir Pandey
http://scikit-learn.org/dev/modules/sgd.html#tips-on-practical-use
one should scale data. An example which shows what really happens to one
cost function (say squared loss) on scaling the data would be great.
Date: Fri, 26 Apr 2013 02:37:27 +0530
Subject: Re: [Scikit-learn-general] Effects of shifting and scaling on
Gradient Descent
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
I did not mean parameters of the cost function. I only want to scale
the
input variables. Suppose one of the independent variables has a range
from 10 - 1000 and some other has a range in 0.1 - 1. Then Andrew Ng
and
others say in their machine learning lectures that one should rescale
the input data to bring all variables to similar range
(
http://openclassroom.stanford.edu/MainFolder/VideoPage.php?course=MachineLearning&video=03.1-LinearRegressionII-FeatureScaling&speed=100
)
. This will affect how the gradient descent will behave.
We can choose cost function right now to be the squared loss function.
Date: Thu, 25 Apr 2013 19:15:59 +0100 From: Matthieu Brucher
Content-Type: text/plain; charset="iso-8859-1" Hi, Do you mean
scaling
the parameters of the cost function? If so, scaling will change the
surface of the cost function, of course. It's kind of complicated to
say anything about how the surface will behave, it completely depends
of the cost function you are using. A cost function that is linear
will have the same scale applied to the surface, but anything fancier
will behave differently (squared sum, robust cost...) This also means
that the gradient descent will be different ans may converge to a
different location. As Ga?l said, this is a generic
optimization-related question, it is not machine-learning related.
Post by Shishir Pandey
Thanks Ronnie for pointing out the exact method in the
scikit-learn
Post by Shishir Pandey
library. Yes, that is exactly what I was asking how does the
rescaling
Post by Shishir Pandey
of features affect the gradient descent algorithm. Since,
stochastic
Post by Shishir Pandey
gradient descent is an algorithm which is used in machine
learning quite
Post by Shishir Pandey
a lot. It will be good to understand how its performance is
affected
Post by Shishir Pandey
after rescaling features.
Jaques, I am having some trouble running the example. But yes
it will be
Post by Shishir Pandey
good if we can have gui example.
On 25-04-2013 19:12,
Date: Thu, 25 Apr 2013 09:10:35 -0400
Subject: Re: [Scikit-learn-general] Effects of shifting
and scaling on
Post by Shishir Pandey
Gradient Descent
<CAHazPTmZX1dmMT1Mm_hTQjyyB8aV5C=
Content-Type: text/plain; charset="iso-8859-1"
I think he means what increases/benefits do you get from
rescaling
Post by Shishir Pandey
features
e.g. minmax or preprocessing.scale
On Thu, Apr 25, 2013 at 02:09:13PM +0200, Jaques Grobler
Post by
Post by Shishir Pandey
I also think it will be great to have
this example on the website.
Post by Shishir Pandey
Post by
Do you mean like an interactive example that
works similiar to the SVM
Post by Shishir Pandey
Post by
Gui example , but for understand the effects
shifting and scaling of
Post by Shishir Pandey
Post by
data has on the rate of convergence of gradient
descent and the surface
Post by Shishir Pandey
Post by
of the cost function?
This is out of scope for the project: scikit-learn is a
machine learning
Post by Shishir Pandey
toolkit. Gradient descent is a general class of
optimization algorithms.
Post by Shishir Pandey
Ga?l
--
sp
-- sp
--
sp
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt!
http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt!
http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Shishir Pandey
2013-04-26 10:47:36 UTC
Permalink
@Jaques Grobler: I ran the libsvm GUI code on the sklearn version 13.1
it was giving error importing - "from sklearn.externals.six.move import
xrange". But I commented the above line and it is working just fine.

As you have suggested GUI example might not really be that necessary.
Illustrating difference with plots would also be good enough.

Thanks a lot to Peter for the link.
Date: Fri, 26 Apr 2013 12:38:00 +0200
Subject: Re: [Scikit-learn-general] Effects of shifting and scaling on
Gradient Descent
To: Scikit-Learn Mailing List
Content-Type: text/plain; charset="iso-8859-1"
@Shishir Pandey on a slight tangent, what problems are you having with
running Libsvm GUI?
I wonder if a GUI interactive example would really be necessary - we could
just have an example
illustrating the difference with plots when data is not scaled or scaled..
if people find that useful.
But the GUI example could be nice too :)
Post by Peter Prettenhofer
(first-order) GD uses a single learning rate for all features - if
features have a different variability its difficult to find a
one-size-fits-all learning rate - the parameters of high variability
features will tend to oscillate whereas the parameters of low variability
features will converge too slowly.
There is a huge amount of literature on the topic - the Neural Network FAQ
[1] is a good (practical) starting point.
[1] ftp://ftp.sas.com/pub/neural/FAQ2.html#A_std
--
sp
Gael Varoquaux
2013-04-26 10:52:37 UTC
Permalink
Post by Shishir Pandey
@Jaques Grobler: I ran the libsvm GUI code on the sklearn version 13.1
it was giving error importing - "from sklearn.externals.six.move import
xrange".
Which error? Could you copy/paste it here?

G
Gianni Iannelli
2013-04-26 18:39:11 UTC
Permalink
First of all, thanks for the answer!!! I understand what are you talking about but I was thinking that, since that I can get the support vectors, I could draw a decision boundary. Actually, in the figure in attach, I have an idea about the separation line/plane looking at the support vectors. I was thinking that a draw could be done using these two features.
Thanks,Solimyr
Subject: Scikit-learn-general Digest, Vol 39, Issue 61
Date: Thu, 25 Apr 2013 23:01:09 +0000
Send Scikit-learn-general mailing list submissions to
To subscribe or unsubscribe via the World Wide Web, visit
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
or, via email, send a message with subject or body 'help' to
You can reach the person managing the list at
When replying, please edit your Subject line so it is more specific
than "Re: Contents of Scikit-learn-general digest..."
1. [scikit-learn] plot SVM results and classification space
(Gianni Iannelli)
2. Re: [scikit-learn] plot SVM results and classification space
(Andreas Mueller)
3. Re: Effects of shifting and scaling on Gradient Descent
(Shishir Pandey)
4. Re: [scikit-learn] plot SVM results and classification space
(Robert Layton)
----------------------------------------------------------------------
Message: 1
Date: Thu, 25 Apr 2013 22:31:02 +0200
Subject: [Scikit-learn-general] [scikit-learn] plot SVM results and
classification space
Content-Type: text/plain; charset="iso-8859-1"
Hi everyone!
I'm new to scikit and I'm gettin trouble with some visualization method!!!
What I wanna do is visualize in a plot/graph, something like this: http://scikit-learn.org/stable/auto_examples/exercises/plot_iris_exercise.html
Essentialy I would see the background color based on my training set to see how the SVM will classify my new elements. It could be a line or a color in the bacground.
Everything is ok if I will use just two features! I can obtain XX and YY but, in my case, I have 6 features and I don't know hot to do the grid!! Almost all the example use just two features from the iris dataset...There is one that use four but essentially it applies a PCA in order to reduce the features to two and so, let me say, I kind of cheating in order to get the graph....
....graph = pylab.scatter(X_train[:,0],X_train[:,2],c=colors, zorder= 10)ax = grafico.axesax.set_xlabel('Feature_1')ax.set_ylabel('Feature_2')pylab.scatter(clf.support_vectors_[:,0],clf.support_vectors_[:,2],marker = 'x', c='y',s=200,zorder= 1, label='Support Vector')pylab.legend(loc = 'lower right')pylab.show()....
Where X_train contain six features and not just two!
Thanks to everyone!!!
Solimyr
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 2
Date: Thu, 25 Apr 2013 22:56:00 +0200
Subject: Re: [Scikit-learn-general] [scikit-learn] plot SVM results
and classification space
Content-Type: text/plain; charset="iso-8859-1"
Hi Gianni.
There is a fundamental problem with what you want to do, independent of
SVMs.
In the plot, the 2d plane of the pot represents the input space.
Your input space is 6d. You can not represent 6d on a computer monitor
(that I know of).
So there is no way to plot your data.
What you could do, is plot 2d projections of you data, for example using
PCA.
That makes it somewhat harder to plot decision boundaries, though.
Hth,
Andy
Hi everyone!
I'm new to scikit and I'm gettin trouble with some visualization
method!!!
http://scikit-learn.org/stable/auto_examples/exercises/plot_iris_exercise.html
Essentialy I would see the background color based on my training set
to see how the SVM will classify my new elements. It could be a line
or a color in the bacground.
Everything is ok if I will use just two features! I can obtain XX and
YY but, in my case, I have 6 features and I don't know hot to do the
grid!! Almost all the example use just two features from the iris
dataset...There is one that use four but essentially it applies a PCA
in order to reduce the features to two and so, let me say, I kind of
cheating in order to get the graph....
....
graph = pylab.scatter(X_train[:,0],X_train[:,2],c=colors, zorder= 10)
ax = grafico.axes
ax.set_xlabel('Feature_1')
ax.set_ylabel('Feature_2')
pylab.scatter(clf.support_vectors_[:,0],clf.support_vectors_[:,2],marker
= 'x', c='y',s=200,zorder= 1, label='Support Vector')
pylab.legend(loc = 'lower right')
pylab.show()
....
Where X_train contain six features and not just two!
Thanks to everyone!!!
Solimyr
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
Message: 3
Date: Fri, 26 Apr 2013 02:37:27 +0530
Subject: Re: [Scikit-learn-general] Effects of shifting and scaling on
Gradient Descent
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
I did not mean parameters of the cost function. I only want to scale the
input variables. Suppose one of the independent variables has a range
from 10 - 1000 and some other has a range in 0.1 - 1. Then Andrew Ng and
others say in their machine learning lectures that one should rescale
the input data to bring all variables to similar range
(http://openclassroom.stanford.edu/MainFolder/VideoPage.php?course=MachineLearning&video=03.1-LinearRegressionII-FeatureScaling&speed=100)
. This will affect how the gradient descent will behave.
We can choose cost function right now to be the squared loss function.
Date: Thu, 25 Apr 2013 19:15:59 +0100 From: Matthieu Brucher
Content-Type: text/plain; charset="iso-8859-1" Hi, Do you mean scaling
the parameters of the cost function? If so, scaling will change the
surface of the cost function, of course. It's kind of complicated to
say anything about how the surface will behave, it completely depends
of the cost function you are using. A cost function that is linear
will have the same scale applied to the surface, but anything fancier
will behave differently (squared sum, robust cost...) This also means
that the gradient descent will be different ans may converge to a
different location. As Ga?l said, this is a generic
optimization-related question, it is not machine-learning related.
Post by Shishir Pandey
Thanks Ronnie for pointing out the exact method in the scikit-learn
library. Yes, that is exactly what I was asking how does the rescaling
of features affect the gradient descent algorithm. Since, stochastic
gradient descent is an algorithm which is used in machine learning quite
a lot. It will be good to understand how its performance is affected
after rescaling features.
Jaques, I am having some trouble running the example. But yes it will be
good if we can have gui example.
Date: Thu, 25 Apr 2013 09:10:35 -0400
Subject: Re: [Scikit-learn-general] Effects of shifting and scaling on
Gradient Descent
<CAHazPTmZX1dmMT1Mm_hTQjyyB8aV5C=
Content-Type: text/plain; charset="iso-8859-1"
I think he means what increases/benefits do you get from rescaling
features
e.g. minmax or preprocessing.scale
Post by
Post by Shishir Pandey
I also think it will be great to have this example on the website.
Do you mean like an interactive example that works similiar to the SVM
Gui example , but for understand the effects shifting and scaling of
data has on the rate of convergence of gradient descent and the surface
of the cost function?
This is out of scope for the project: scikit-learn is a machine learning
toolkit. Gradient descent is a general class of optimization algorithms.
Ga?l
--
sp
--
sp
------------------------------
Message: 4
Date: Fri, 26 Apr 2013 09:00:38 +1000
Subject: Re: [Scikit-learn-general] [scikit-learn] plot SVM results
and classification space
Content-Type: text/plain; charset="utf-8"
As Andy said, you need to create some representation in two dimensions.
You can easily do this by selecting just two features (i.e. the two
most discriminating) or PCA is another good option, but it can be difficult
to understand what is meant by the x-axis and y-axis.
Keep in mind that classification and visualisation are
two distinct components here -- classify using all features and visualise
using two dimensions.
(You can try classify with less features, sometimes it works.)
- Robert
Hi Gianni.
There is a fundamental problem with what you want to do, independent of
SVMs.
In the plot, the 2d plane of the pot represents the input space.
Your input space is 6d. You can not represent 6d on a computer monitor
(that I know of).
So there is no way to plot your data.
What you could do, is plot 2d projections of you data, for example using
PCA.
That makes it somewhat harder to plot decision boundaries, though.
Hth,
Andy
Hi everyone!
I'm new to scikit and I'm gettin trouble with some visualization
method!!!
http://scikit-learn.org/stable/auto_examples/exercises/plot_iris_exercise.html
Essentialy I would see the background color based on my training set to
see how the SVM will classify my new elements. It could be a line or a
color in the bacground.
Everything is ok if I will use just two features! I can obtain XX and YY
but, in my case, I have 6 features and I don't know hot to do the grid!!
Almost all the example use just two features from the iris dataset...There
is one that use four but essentially it applies a PCA in order to reduce
the features to two and so, let me say, I kind of cheating in order to get
the graph....
....
graph = pylab.scatter(X_train[:,0],X_train[:,2],c=colors, zorder= 10)
ax = grafico.axes
ax.set_xlabel('Feature_1')
ax.set_ylabel('Feature_2')
pylab.scatter(clf.support_vectors_[:,0],clf.support_vectors_[:,2],marker =
'x', c='y',s=200,zorder= 1, label='Support Vector')
pylab.legend(loc = 'lower right')
pylab.show()
....
Where X_train contain six features and not just two!
Thanks to everyone!!!
Solimyr
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Public key at: http://pgp.mit.edu/ Search for this email address and select
the key from "2011-08-19" (key id: 54BA8735)
-------------- next part --------------
An HTML attachment was scrubbed...
------------------------------
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
End of Scikit-learn-general Digest, Vol 39, Issue 61
****************************************************
Loading...