Discussion:
[Scikit-learn-general] using Support vector regression/gaussian regression with a Pearson VII function kernel
Amita Misra
2016-03-29 01:40:05 UTC
Permalink
Hi,

I was using weka earlier for support vector regression and gaussian
regression

I am now switching to scikit and was trying to replicate my results using
support vector regression.
I could get similar results for poly and rbf kernel.

Is there a way I can specify Pearson VII function kernel(Puk Kernel) for
support vector and Gaussian regression?

Thanks,
Amita
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
Andreas Mueller
2016-03-31 21:17:30 UTC
Permalink
Hi.
What do you mean by Gaussian regression?
You can specify your own kernels for SVMs, but it will be a bit slower.

Cheers,
Andy
Post by Amita Misra
Hi,
I was using weka earlier for support vector regression and gaussian
regression
I am now switching to scikit and was trying to replicate my results
using support vector regression.
I could get similar results for poly and rbf kernel.
Is there a way I can specify Pearson VII function kernel(Puk Kernel)
for support vector and Gaussian regression?
Thanks,
Amita
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Sebastian Raschka
2016-03-31 23:45:14 UTC
Permalink
Seems like that the GaussianProcess class only has an autocorrelation parameter, but when I understand correctly, the autocorrelation function is just the “normalized” covariance kernel, thus it may be possible to provide custom kernels here as well? If not, it may be interesting to re-factor it a little bit and borrow the code from SVM to accept custom kernels in Gaussian Processes as well?
Post by Andreas Mueller
Hi.
What do you mean by Gaussian regression?
You can specify your own kernels for SVMs, but it will be a bit slower.
Cheers,
Andy
Hi,
I was using weka earlier for support vector regression and gaussian regression
I am now switching to scikit and was trying to replicate my results using support vector regression.
I could get similar results for poly and rbf kernel.
Is there a way I can specify Pearson VII function kernel(Puk Kernel) for support vector and Gaussian regression?
Thanks,
Amita
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Manoj Kumar
2016-03-31 23:59:46 UTC
Permalink
Hi,

I remember seeing it somewhere but don't recall where exactly.

You can do it by the following.

1. Inherit from Kernel (
http://scikit-learn.org/dev/modules/generated/sklearn.gaussian_process.kernels.Kernel.html#sklearn.gaussian_process.kernels.Kernel
)
2. Make your hyperparameters on the kernel an attribute starting with
`hyperparameter_` and make it an instance of Hyperparameter as done here (
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/kernels.py#L1711
)
3. Implement a __call__ that computes the kernel matrix and if possible the
gradient.

If not already present, this might be worth illustrating in an example
Post by Sebastian Raschka
Seems like that the GaussianProcess class only has an autocorrelation
parameter, but when I understand correctly, the autocorrelation function is
just the “normalized” covariance kernel, thus it may be possible to provide
custom kernels here as well? If not, it may be interesting to re-factor it
a little bit and borrow the code from SVM to accept custom kernels in
Gaussian Processes as well?
Post by Andreas Mueller
Hi.
What do you mean by Gaussian regression?
You can specify your own kernels for SVMs, but it will be a bit slower.
Cheers,
Andy
Post by Amita Misra
Hi,
I was using weka earlier for support vector regression and gaussian
regression
Post by Andreas Mueller
Post by Amita Misra
I am now switching to scikit and was trying to replicate my results
using support vector regression.
Post by Andreas Mueller
Post by Amita Misra
I could get similar results for poly and rbf kernel.
Is there a way I can specify Pearson VII function kernel(Puk Kernel)
for support vector and Gaussian regression?
Post by Andreas Mueller
Post by Amita Misra
Thanks,
Amita
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Post by Andreas Mueller
Post by Amita Misra
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Andreas Mueller
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Manoj,
http://github.com/MechCoder
Vincent Dubourg
2016-04-01 05:56:07 UTC
Permalink
Hi,

Manoj's tip only works if you are already using the 0.18dev version of
sklearn that brings a total refactoring of the gaussian_process submodule.

If you are still using the latest stable (aka 0.17), then custom kernels
can be specified as callable (simple Python function) using the `corr`
keyword argument.
*corr* : string or callable, optional
A stationary autocorrelation function returning the autocorrelation
between two points x and x’. Default assumes a squared-exponential
'absolute_exponential', 'squared_exponential','generalized_exponential', 'cubic', 'linear'
"""
Squared exponential correlation model (Radial Basis Function).
n
theta, dx --> r(theta, dx) = exp( sum - theta_i * (dx_i)^2 )
i = 1
Parameters
----------
theta : array_like
An array with shape 1 (isotropic) or n (anisotropic) giving the
autocorrelation parameter(s).
dx : array_like
An array with shape (n_eval, n_features) giving the componentwise
distances between locations x and x' at which the correlation model
should be evaluated.
Returns
-------
r : array_like
An array with shape (n_eval, ) containing the values of the
autocorrelation model.
"""
theta = np.asarray(theta, dtype=np.float)
d = np.asarray(d, dtype=np.float)
n_features = d.shape[1]
n_features = 1
return np.exp(-theta[0] * np.sum(d ** 2, axis=1))
raise ValueError("Length of theta must be 1 or %s" % n_features)
return np.exp(-np.sum(theta.reshape(1, n_features) * d ** 2,
axis=1))
Implement the Pearson kernel like this and pass it to the GaussianProcess
estimator.

Cheers,
Vincent
Hi,
I remember seeing it somewhere but don't recall where exactly.
You can do it by the following.
1. Inherit from Kernel (
http://scikit-learn.org/dev/modules/generated/sklearn.gaussian_process.kernels.Kernel.html#sklearn.gaussian_process.kernels.Kernel
)
2. Make your hyperparameters on the kernel an attribute starting with
`hyperparameter_` and make it an instance of Hyperparameter as done here (
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/kernels.py#L1711
)
3. Implement a __call__ that computes the kernel matrix and if possible
the gradient.
If not already present, this might be worth illustrating in an example
Post by Sebastian Raschka
Seems like that the GaussianProcess class only has an autocorrelation
parameter, but when I understand correctly, the autocorrelation function is
just the “normalized” covariance kernel, thus it may be possible to provide
custom kernels here as well? If not, it may be interesting to re-factor it
a little bit and borrow the code from SVM to accept custom kernels in
Gaussian Processes as well?
Post by Andreas Mueller
Hi.
What do you mean by Gaussian regression?
You can specify your own kernels for SVMs, but it will be a bit slower.
Cheers,
Andy
Post by Amita Misra
Hi,
I was using weka earlier for support vector regression and gaussian
regression
Post by Andreas Mueller
Post by Amita Misra
I am now switching to scikit and was trying to replicate my results
using support vector regression.
Post by Andreas Mueller
Post by Amita Misra
I could get similar results for poly and rbf kernel.
Is there a way I can specify Pearson VII function kernel(Puk Kernel)
for support vector and Gaussian regression?
Post by Andreas Mueller
Post by Amita Misra
Thanks,
Amita
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Post by Andreas Mueller
Post by Amita Misra
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Andreas Mueller
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Manoj,
http://github.com/MechCoder
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2016-04-01 14:51:16 UTC
Permalink
Upgrading to the dev version might be the better route, though.
Post by Vincent Dubourg
Hi,
Manoj's tip only works if you are already using the 0.18dev version of
sklearn that brings a total refactoring of the gaussian_process submodule.
If you are still using the latest stable (aka 0.17), then custom
kernels can be specified as callable (simple Python function) using
the `corr` keyword argument.
*corr* : string or callable, optional
A stationary autocorrelation function returning the
autocorrelation between two points x and x’. Default assumes a
squared-exponential autocorrelation model. Built-in
'absolute_exponential', 'squared_exponential',
'generalized_exponential', 'cubic', 'linear'
"""
Squared exponential correlation model (Radial Basis Function).
n
theta, dx --> r(theta, dx) = exp( sum - theta_i * (dx_i)^2 )
i = 1
Parameters
----------
theta : array_like
An array with shape 1 (isotropic) or n (anisotropic) giving the
autocorrelation parameter(s).
dx : array_like
An array with shape (n_eval, n_features) giving the componentwise
distances between locations x and x' at which the
correlation model
should be evaluated.
Returns
-------
r : array_like
An array with shape (n_eval, ) containing the values of the
autocorrelation model.
"""
theta = np.asarray(theta, dtype=np.float)
d = np.asarray(d, dtype=np.float)
n_features = d.shape[1]
n_features = 1
return np.exp(-theta[0] * np.sum(d ** 2, axis=1))
raise ValueError("Length of theta must be 1 or %s" % n_features)
return np.exp(-np.sum(theta.reshape(1, n_features) * d **
2, axis=1))
Implement the Pearson kernel like this and pass it to the
GaussianProcess estimator.
Cheers,
Vincent
Hi,
I remember seeing it somewhere but don't recall where exactly.
You can do it by the following.
1. Inherit from Kernel
(http://scikit-learn.org/dev/modules/generated/sklearn.gaussian_process.kernels.Kernel.html#sklearn.gaussian_process.kernels.Kernel)
2. Make your hyperparameters on the kernel an attribute starting
with `hyperparameter_` and make it an instance of Hyperparameter
as done here
(https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/kernels.py#L1711)
3. Implement a __call__ that computes the kernel matrix and if
possible the gradient.
If not already present, this might be worth illustrating in an example
On Thu, Mar 31, 2016 at 7:45 PM, Sebastian Raschka
Seems like that the GaussianProcess class only has an
autocorrelation parameter, but when I understand correctly,
the autocorrelation function is just the “normalized”
covariance kernel, thus it may be possible to provide custom
kernels here as well? If not, it may be interesting to
re-factor it a little bit and borrow the code from SVM to
accept custom kernels in Gaussian Processes as well?
On Mar 31, 2016, at 5:17 PM, Andreas Mueller
Hi.
What do you mean by Gaussian regression?
You can specify your own kernels for SVMs, but it will be a
bit slower.
Cheers,
Andy
Post by Amita Misra
Hi,
I was using weka earlier for support vector regression and
gaussian regression
Post by Amita Misra
I am now switching to scikit and was trying to replicate my
results using support vector regression.
Post by Amita Misra
I could get similar results for poly and rbf kernel.
Is there a way I can specify Pearson VII function
kernel(Puk Kernel) for support vector and Gaussian regression?
Post by Amita Misra
Thanks,
Amita
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Post by Amita Misra
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
Post by Amita Misra
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Manoj,
http://github.com/MechCoder
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Amita Misra
2016-04-01 18:59:22 UTC
Permalink
Thanks for the pointers.

I am using stable version 0.17 so I used nested cross validation with
different corr types. I could get the best result using linear correlation.
However it was still short of what i got using universal Pearson VII
function based kernel ( as available in weka, off the shelf) for both SVM
and gaussian regression.

Hence I was trying to find if there is an implementation already in
scikit where I can specify puk as kernel option since I was not sure if I
could myself correctly implement the puk kernel.

Thanks,
Amita
Post by Andreas Mueller
Upgrading to the dev version might be the better route, though.
Hi,
Manoj's tip only works if you are already using the 0.18dev version of
sklearn that brings a total refactoring of the gaussian_process submodule.
If you are still using the latest stable (aka 0.17), then custom kernels
can be specified as callable (simple Python function) using the `corr`
keyword argument.
*corr* : string or callable, optional
A stationary autocorrelation function returning the autocorrelation
between two points x and x’. Default assumes a squared-exponential
'absolute_exponential', 'squared_exponential','generalized_exponential', 'cubic', 'linear'
"""
Squared exponential correlation model (Radial Basis Function).
n
theta, dx --> r(theta, dx) = exp( sum - theta_i * (dx_i)^2 )
i = 1
Parameters
----------
theta : array_like
An array with shape 1 (isotropic) or n (anisotropic) giving the
autocorrelation parameter(s).
dx : array_like
An array with shape (n_eval, n_features) giving the componentwise
distances between locations x and x' at which the correlation model
should be evaluated.
Returns
-------
r : array_like
An array with shape (n_eval, ) containing the values of the
autocorrelation model.
"""
theta = np.asarray(theta, dtype=np.float)
d = np.asarray(d, dtype=np.float)
n_features = d.shape[1]
n_features = 1
return np.exp(-theta[0] * np.sum(d ** 2, axis=1))
raise ValueError("Length of theta must be 1 or %s" % n_features)
return np.exp(-np.sum(theta.reshape(1, n_features) * d ** 2,
axis=1))
Implement the Pearson kernel like this and pass it to the GaussianProcess
estimator.
Cheers,
Vincent
Hi,
I remember seeing it somewhere but don't recall where exactly.
You can do it by the following.
1. Inherit from Kernel (
http://scikit-learn.org/dev/modules/generated/sklearn.gaussian_process.kernels.Kernel.html#sklearn.gaussian_process.kernels.Kernel
)
2. Make your hyperparameters on the kernel an attribute starting with
`hyperparameter_` and make it an instance of Hyperparameter as done here (
<https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/kernels.py#L1711>
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/gaussian_process/kernels.py#L1711
)
3. Implement a __call__ that computes the kernel matrix and if possible
the gradient.
If not already present, this might be worth illustrating in an example
On Thu, Mar 31, 2016 at 7:45 PM, Sebastian Raschka <
Post by Sebastian Raschka
Seems like that the GaussianProcess class only has an autocorrelation
parameter, but when I understand correctly, the autocorrelation function is
just the “normalized” covariance kernel, thus it may be possible to provide
custom kernels here as well? If not, it may be interesting to re-factor it
a little bit and borrow the code from SVM to accept custom kernels in
Gaussian Processes as well?
Post by Andreas Mueller
Hi.
What do you mean by Gaussian regression?
You can specify your own kernels for SVMs, but it will be a bit slower.
Cheers,
Andy
Post by Amita Misra
Hi,
I was using weka earlier for support vector regression and gaussian
regression
Post by Andreas Mueller
Post by Amita Misra
I am now switching to scikit and was trying to replicate my results
using support vector regression.
Post by Andreas Mueller
Post by Amita Misra
I could get similar results for poly and rbf kernel.
Is there a way I can specify Pearson VII function kernel(Puk Kernel)
for support vector and Gaussian regression?
Post by Andreas Mueller
Post by Amita Misra
Thanks,
Amita
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
------------------------------------------------------------------------------
Post by Andreas Mueller
Post by Amita Misra
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Andreas Mueller
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Manoj,
http://github.com/MechCoder
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Amita Misra
Graduate Student Researcher
Natural Language and Dialogue Systems Lab
Baskin School of Engineering
University of California Santa Cruz
Loading...