[Scikit-learn-general] Why does SV regression crashes here ?

Discussion:

Nicolas Cedilnik

2016-04-21 11:32:12 UTC

Hi all,

I'm trying to use scikit-learn to do SV regression and this small data
set causes it to crash every time. I can't even stop the process with
CTRL+C and have to kill the process some other way. I've tested it on
python 3.5 and 2.7.

Am I doing something wrong or should I report a bug?

Here's some copy-pastable code to reproduce the issue:

from sklearn.svm import SVR
import numpy as np

X=np.array([[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ]])
y=np.array([ 0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.2,
0.3,
0.3, 0.3, 0.3, 0.3, 0.4, 0.4, 0.4, 0.4, 0.4, 0.5, 0.5,
0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.6, 0.6, 0.7, 0.7, 0.7,
0.7, 0.7, 0.8, 0.8, 0.8, 0.8, 0.8, 0.9, 0.9, 0.9, 0.9,
0.9])
weights=np.array([ 1. , 0.75 , 1. , 0.88867188,
0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391],
dtype=np.float16)

svr_poly = SVR(kernel='poly', C=1e3, degree=2)
fit = svr_poly.fit(X, y, weights)

-- Nicolas Cedilnik

PS: this is not the 'real' data I need the regression on.

Piotr Bialecki

2016-04-21 12:35:08 UTC

Permalink

Hi Nicolas,

I tried your Code and my script also crashed.

However, I think you might have forgotten to scale your input before
using the SVR.

Try this instead:

from sklearn.svm import SVR
from sklearn.preprocessing import StandardScaler
import numpy as np

X=np.array([[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ]])
y=np.array([ 0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.2,
0.3,
0.3, 0.3, 0.3, 0.3, 0.4, 0.4, 0.4, 0.4, 0.4, 0.5, 0.5,
0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.6, 0.6, 0.7, 0.7, 0.7,
0.7, 0.7, 0.8, 0.8, 0.8, 0.8, 0.8, 0.9, 0.9, 0.9, 0.9,
0.9])
weights=np.array([ 1. , 0.75 , 1. , 0.88867188,
0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391],
dtype=np.float16)

X =StandardScaler().fit_transform(X)

svr_poly = SVR(kernel='poly', C=1e3, degree=2, verbose=True)

fit = svr_poly.fit(X, y, weights)

This is working for me.
Don't forget to fit your StandardScaler to your training set and using
the training mean and scale on your test/validation set.

Greets,
Piotr

Post by Nicolas Cedilnik
Hi all,
I'm trying to use scikit-learn to do SV regression and this small data
set causes it to crash every time. I can't even stop the process with
CTRL+C and have to kill the process some other way. I've tested it on
python 3.5 and 2.7.
Am I doing something wrong or should I report a bug?
from sklearn.svm import SVR
import numpy as np
X=np.array([[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ],
[ 40.8 ],
[ 21327.5900838],
[ 28781.2890295],
[ 29978.2941176],
[ 30732.562406 ]])
y=np.array([ 0.1, 0.1, 0.1, 0.1, 0.1, 0.2, 0.2, 0.2, 0.2, 0.2,
0.3,
0.3, 0.3, 0.3, 0.3, 0.4, 0.4, 0.4, 0.4, 0.4, 0.5, 0.5,
0.5, 0.5, 0.5, 0.6, 0.6, 0.6, 0.6, 0.6, 0.7, 0.7, 0.7,
0.7, 0.7, 0.8, 0.8, 0.8, 0.8, 0.8, 0.9, 0.9, 0.9, 0.9,
0.9])
weights=np.array([ 1. , 0.75 , 1. , 0.88867188,
0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391,
1. , 0.75 , 1. , 0.88867188, 0.66650391],
dtype=np.float16)
svr_poly = SVR(kernel='poly', C=1e3, degree=2)
fit = svr_poly.fit(X, y, weights)
-- Nicolas Cedilnik
PS: this is not the 'real' data I need the regression on.
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Olivier Grisel

2016-04-21 14:00:37 UTC

Permalink

Do you get a segmentation fault? Do you think this is the same issue as:

https://github.com/scikit-learn/scikit-learn/issues/6687

or something different?

--
Olivier Grisel

Michael Eickenberg

2016-04-21 14:10:49 UTC

Permalink

To me it looks like the SVM is just having huge troubles converging: The
data seem quite degenerate by the looks of it.
On my machine it just keeps running. No segfault.
The fact that it works after rescaling is interesting and also points
towards convergence problems for the non-scaled version.

Post by Olivier Grisel
https://github.com/scikit-learn/scikit-learn/issues/6687
or something different?
--
Olivier Grisel
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications Manager
Applications Manager provides deep performance insights into multiple tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Nicolas Cedilnik

2016-04-21 14:21:20 UTC

Permalink

First, thanks Piotr for your help, rescaling indeed solved the matter.

No segfault here either, it looks like we're in an infinite loop. I can
provide additional information if you need, but I think the code I've
pasted earlier pretty much sums everything.

Olivier Grisel

2016-04-21 14:39:16 UTC

Permalink

Indeed, that is not a bug then.

--
Olivier

Mathieu Blondel

2016-04-21 14:51:52 UTC

Permalink

By default, SVC stops only when the desired tolerance is reached. If the
problem is poorly scaled, this can indeed take ages. You can however set
max_iter to prevent this.

http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html

We might want to change the default from -1 to something large like 1000.

Mathieu

Post by Olivier Grisel
Indeed, that is not a bug then.
--
Olivier
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications
Manager
Applications Manager provides deep performance insights into multiple
tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Mathieu Blondel

2016-04-21 14:57:10 UTC

Permalink

Another remark is that you set C=1e3. Depending on the scaling of your
data, this can be quite large. This means that the SVM is very lightly
regularized (=> hard SVM) and therefore the problem is ill-conditioned.

Mathieu

Post by Mathieu Blondel
By default, SVC stops only when the desired tolerance is reached. If the
problem is poorly scaled, this can indeed take ages. You can however set
max_iter to prevent this.
http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html
We might want to change the default from -1 to something large like 1000.
Mathieu

Olivier Grisel

2016-04-21 17:49:07 UTC

Permalink

+1 for setting max_iter=1000 with a ConvergenceWarning if it is
reached as in most other scikit-learn model.

Any volunteer for a PR?

--
Olivier