Discussion:
libsvm PR
(too old to reply)
James Bergstra
2012-09-26 19:27:26 UTC
Permalink
Hi list,

I submitted a libsvm-related PR on github to add a new parameter. It
addresses an infinite loop in libsvm's solver, but in doing so, it
required a non-trivial patch of the libsvm source code, in addition to
the cython bindings and the classes in the svm submodule. Are changes
to libsvm welcome in sklearn?

- James
Andreas Mueller
2012-09-26 19:49:00 UTC
Permalink
Hi James.
Thanks for the PR.
I thinks so far we avoided changing LibSVM and tried to get patches
in upstream. Afaik, this hasn't succeeded so far.
The cases I am thinking of is me trying to get the chi2 kernel in and
Lars cleaning up some of the code.

As LibSVM seems to be very conservative wrt. features and patches,
I think we should reconsider our policy and accept your modifications.

Can you give some insights into why this check is necessary and in
what kind of situations LibSVM fails to converge? I guess it uses
the duality gap for convergence. Is is the case that this is not
a good measure sometimes?

Cheers,
Andy
Joseph Turian
2012-09-26 19:51:49 UTC
Permalink
If sklearn will be maintaining a patch set against libsvm, this patch set should be available to non sklearn users too.

Von meinem iPhone gesendet
Post by Andreas Mueller
Hi James.
Thanks for the PR.
I thinks so far we avoided changing LibSVM and tried to get patches
in upstream. Afaik, this hasn't succeeded so far.
The cases I am thinking of is me trying to get the chi2 kernel in and
Lars cleaning up some of the code.
As LibSVM seems to be very conservative wrt. features and patches,
I think we should reconsider our policy and accept your modifications.
Can you give some insights into why this check is necessary and in
what kind of situations LibSVM fails to converge? I guess it uses
the duality gap for convergence. Is is the case that this is not
a good measure sometimes?
Cheers,
Andy
------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Gael Varoquaux
2012-09-26 20:24:19 UTC
Permalink
Post by Joseph Turian
If sklearn will be maintaining a patch set against libsvm, this patch set should be available to non sklearn users too.
I reckon you are volonteering to maintain a fork of libsvm? That's very
good news, the community definitely needs this badly.

Gael ;o

PS: this little pique was only there to stress that the resource that is
lacking is simply time, but otherwise we think that it's a great idea.
Joseph Turian
2012-09-26 21:14:27 UTC
Permalink
Post by Gael Varoquaux
Post by Joseph Turian
If sklearn will be maintaining a patch set against libsvm, this patch set should be available to non sklearn users too.
I reckon you are volonteering to maintain a fork of libsvm? That's very
good news, the community definitely needs this badly.
I was considering the idea of a fork, but I think it might be premature.

Everyone seems to agree that it would be great if the patches merged upstream.

If the maintainers have been unwilling to merge patches, then they
might not merge the patches this time around.

If you fork the project, that might be taken as an aggressive move and
they will be unwilling to work together with the maintainers of the
fork.

My thought was that releasing a patch set, but not actively
maintaining it, and following it up with an email to maintainers:
"We'd like to see our patch set merged into libsvm core code", then
there is some pressure on libsvm to merge the patches but not an
aggressive amount (like forking).

Just my thoughts.

Joseph
Gael Varoquaux
2012-09-26 21:30:46 UTC
Permalink
Hey Joseph,

Fair enough with regards to your points about a fork being considered as
aggressive. Thanks a lot raising this point. I guess that I was more
thinking of fork in terms of version control rather than in terms of
creating a parallel project. I have grown used to fork being useful
things :).

Also, I am worried that a collection of patches will bitrot, and my
immediate thought is to put them in a git, just to make sure that history
is conserved.

Of course, my thoughts reflect very much that I am used to working with
open version-control project and distributed version control, which is
not the case here.

Anyhow, both situations, the collection of patches and the fork, are
suboptimal situations.

Thanks for your thoughts,

Gaël
Post by Joseph Turian
Post by Gael Varoquaux
Post by Joseph Turian
If sklearn will be maintaining a patch set against libsvm, this patch set should be available to non sklearn users too.
I reckon you are volonteering to maintain a fork of libsvm? That's very
good news, the community definitely needs this badly.
I was considering the idea of a fork, but I think it might be premature.
Everyone seems to agree that it would be great if the patches merged upstream.
If the maintainers have been unwilling to merge patches, then they
might not merge the patches this time around.
If you fork the project, that might be taken as an aggressive move and
they will be unwilling to work together with the maintainers of the
fork.
My thought was that releasing a patch set, but not actively
"We'd like to see our patch set merged into libsvm core code", then
there is some pressure on libsvm to merge the patches but not an
aggressive amount (like forking).
Just my thoughts.
Joseph
------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Gael Varoquaux
Researcher, INRIA Parietal
Laboratoire de Neuro-Imagerie Assistee par Ordinateur
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone: ++ 33-1-69-08-79-68
http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
Doug Coleman
2012-09-26 21:40:12 UTC
Permalink
I put up a copy of the libsvm-3.12 release on my github. For some
reason, ``make lib`` in the main directory or ``make`` in python/
doesn't work out of the box, so I made a patch that works on my
system.

https://github.com/erg/libsvm

This is not a hostile fork, just a way to get some version control
onto some changes. Hopefully the libsvm people use git and could just
cherrypick the patches they like someday. Hopefully I'd just clone
their next release.

Anyway, you can submit pull requests against it. If someone else would
rather do it, let me know, or if you want admin permissions, let me
know.

Cheers,
Doug

On Wed, Sep 26, 2012 at 2:30 PM, Gael Varoquaux
Post by Gael Varoquaux
Hey Joseph,
Fair enough with regards to your points about a fork being considered as
aggressive. Thanks a lot raising this point. I guess that I was more
thinking of fork in terms of version control rather than in terms of
creating a parallel project. I have grown used to fork being useful
things :).
Also, I am worried that a collection of patches will bitrot, and my
immediate thought is to put them in a git, just to make sure that history
is conserved.
Of course, my thoughts reflect very much that I am used to working with
open version-control project and distributed version control, which is
not the case here.
Anyhow, both situations, the collection of patches and the fork, are
suboptimal situations.
Thanks for your thoughts,
Gaël
Post by Joseph Turian
Post by Gael Varoquaux
Post by Joseph Turian
If sklearn will be maintaining a patch set against libsvm, this patch set should be available to non sklearn users too.
I reckon you are volonteering to maintain a fork of libsvm? That's very
good news, the community definitely needs this badly.
I was considering the idea of a fork, but I think it might be premature.
Everyone seems to agree that it would be great if the patches merged upstream.
If the maintainers have been unwilling to merge patches, then they
might not merge the patches this time around.
If you fork the project, that might be taken as an aggressive move and
they will be unwilling to work together with the maintainers of the
fork.
My thought was that releasing a patch set, but not actively
"We'd like to see our patch set merged into libsvm core code", then
there is some pressure on libsvm to merge the patches but not an
aggressive amount (like forking).
Just my thoughts.
Joseph
------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Gael Varoquaux
Researcher, INRIA Parietal
Laboratoire de Neuro-Imagerie Assistee par Ordinateur
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone: ++ 33-1-69-08-79-68
http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
James Bergstra
2012-09-27 15:55:17 UTC
Permalink
Hi Doug, thanks for this!

I'm still a little shaky with git, I was wondering if people could
advise how to manage the set of libsvm patches? The thing that comes
to my mind is:

1. Fork Doug Coleman's libsvm tree for now into the scikit-learn
organization (If we hear from Chih-Jen Lin, we can edit a few
.git/config links to use his version instead. Unless someone on this
list knows Chih-Jen Lin personally, we may not get a response from him
at all.)

2. Apply the current sklearn diff from libsvm as a single monolithic
commit to this new scikit-learn/libsvm fork. We will rebase this on
libsvm's master periodically.

3. Add scikit-learn/libsvm as a submodule at it's current location:
sklearn/svm/src/libsvm so that it does not disrupt the current file
hierarchy.

4. Add a Makefile command that updates & syncs the git submodule, and
possibly modify some of the existing Makefile commands to trigger that
update before executing.


If that's cool with people, then I would split my PR into one PR
against scikit-learn/libsvm, and one PR against
scikit-learn/scikit-learn that has the Python and Cython stuff.
Post by Doug Coleman
I put up a copy of the libsvm-3.12 release on my github. For some
reason, ``make lib`` in the main directory or ``make`` in python/
doesn't work out of the box, so I made a patch that works on my
system.
https://github.com/erg/libsvm
This is not a hostile fork, just a way to get some version control
onto some changes. Hopefully the libsvm people use git and could just
cherrypick the patches they like someday. Hopefully I'd just clone
their next release.
Anyway, you can submit pull requests against it. If someone else would
rather do it, let me know, or if you want admin permissions, let me
know.
Cheers,
Doug
On Wed, Sep 26, 2012 at 2:30 PM, Gael Varoquaux
Post by Gael Varoquaux
Hey Joseph,
Fair enough with regards to your points about a fork being considered as
aggressive. Thanks a lot raising this point. I guess that I was more
thinking of fork in terms of version control rather than in terms of
creating a parallel project. I have grown used to fork being useful
things :).
Also, I am worried that a collection of patches will bitrot, and my
immediate thought is to put them in a git, just to make sure that history
is conserved.
Of course, my thoughts reflect very much that I am used to working with
open version-control project and distributed version control, which is
not the case here.
Anyhow, both situations, the collection of patches and the fork, are
suboptimal situations.
Thanks for your thoughts,
Gaël
Post by Joseph Turian
Post by Gael Varoquaux
Post by Joseph Turian
If sklearn will be maintaining a patch set against libsvm, this patch set should be available to non sklearn users too.
I reckon you are volonteering to maintain a fork of libsvm? That's very
good news, the community definitely needs this badly.
I was considering the idea of a fork, but I think it might be premature.
Everyone seems to agree that it would be great if the patches merged upstream.
If the maintainers have been unwilling to merge patches, then they
might not merge the patches this time around.
If you fork the project, that might be taken as an aggressive move and
they will be unwilling to work together with the maintainers of the
fork.
My thought was that releasing a patch set, but not actively
"We'd like to see our patch set merged into libsvm core code", then
there is some pressure on libsvm to merge the patches but not an
aggressive amount (like forking).
Just my thoughts.
Joseph
------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Gael Varoquaux
Researcher, INRIA Parietal
Laboratoire de Neuro-Imagerie Assistee par Ordinateur
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone: ++ 33-1-69-08-79-68
http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Olivier Grisel
2012-09-27 16:28:48 UTC
Permalink
I am afraid that managing a git submodule inside the scikit-learn main
repo will add some burden to our users, most of those are not familiar
with git already (and the windowsians out there won't be able to use
the Makefile). Getting to install scikit-learn from source will get
even more complicated for them.

Also the sklearn-embedded libsvm fork is quite heavy as it also
includes the dense variant of the libsvm code base. I don't know
whether Doug's plan is to manage the maintenance of that part as well.
--
Olivier
Mathieu Blondel
2012-09-27 16:44:47 UTC
Permalink
Some parts which are not relevant for inclusion in scikit-learn have also
been removed (command line, libsvm file parsing, ...).

Since our copy of libsvm is quite heavily patched already (dense data,
sample weight, label order, ...), I wonder if it wouldn't be easier to
maintain our own libsvm copy directly in scikit-learn (which is basically
what we are currently doing already).

Mathieu
Lars Buitinck
2012-09-27 16:49:47 UTC
Permalink
Post by Mathieu Blondel
Since our copy of libsvm is quite heavily patched already (dense data,
sample weight, label order, ...), I wonder if it wouldn't be easier to
maintain our own libsvm copy directly in scikit-learn (which is basically
what we are currently doing already).
And hack in direct support for CSR matrices?
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
Lars Buitinck
2012-09-27 16:51:04 UTC
Permalink
Post by Lars Buitinck
And hack in direct support for CSR matrices?
Never mind, I was confusing LibSVM and LibLinear again...
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
Doug Coleman
2012-09-27 18:15:52 UTC
Permalink
Hi everyone,

I'm trying to figure out the best way to proceed. Here are some things
I noticed.

1) scikit's libsvm checkin is currently version 300. The last release
was in April and the version is 312. Are there plans to use the newer
version? The svm_node struct changed, so it's not as trivial as
dropping in the files.

2) libsvm comes with a ctypes binding. What if scikit contributed a
cython binding that was in the libsvm project? Then scikit could just
use the libsvm cython module for implementing fit(), predict(), etc.

3) Where is the test suite in the libsvm project itself? Perhaps they
have something that's not in the main download? The scikit test suite
looks pretty good, but I've gotten used to making a library and then
running the builtin test suite.

4) I opened another issue on my github of some compiler warnings from
the clang++ compiler. It turns out that there are a lot of calls to
malloc where the return pointer is unchecked. So basically the library
can crash at any time. Someone already offered to make a nontrivial
patch using std::vector and new to fix it. How do we want to proceed?
What's the strategy to merge the change with libsvm and the scikit
project?

https://github.com/erg/libsvm/issues/1

Even if we got a cython module into the libsvm project and just called
libsvm as a python module, there would still be the problem of merging
feature enhancements and bugfixes.

Thoughts?

Doug
Post by Mathieu Blondel
Some parts which are not relevant for inclusion in scikit-learn have also
been removed (command line, libsvm file parsing, ...).
Since our copy of libsvm is quite heavily patched already (dense data,
sample weight, label order, ...), I wonder if it wouldn't be easier to
maintain our own libsvm copy directly in scikit-learn (which is basically
what we are currently doing already).
Mathieu
------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
http://ad.doubleclick.net/clk;258768047;13503038;j?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Lars Buitinck
2012-09-27 18:45:22 UTC
Permalink
Post by Doug Coleman
1) scikit's libsvm checkin is currently version 300. The last release
was in April and the version is 312. Are there plans to use the newer
version? The svm_node struct changed, so it's not as trivial as
dropping in the files.
What are the major benefits?
Post by Doug Coleman
2) libsvm comes with a ctypes binding. What if scikit contributed a
cython binding that was in the libsvm project? Then scikit could just
use the libsvm cython module for implementing fit(), predict(), etc.
That would mean that any changes to this Cython binding would have to
be contributed upstream, as you already suggested.
Post by Doug Coleman
4) I opened another issue on my github of some compiler warnings from
the clang++ compiler. It turns out that there are a lot of calls to
malloc where the return pointer is unchecked. So basically the library
can crash at any time. Someone already offered to make a nontrivial
patch using std::vector and new to fix it. How do we want to proceed?
What's the strategy to merge the change with libsvm and the scikit
project?
https://github.com/erg/libsvm/issues/1
That was me :)

I'm quite swamped ATM, but I did intend to refactor our LibSVM
bindings sometime in the near future (preferably before the next
release). I intend to decouple all the prediction code from LibSVM and
rewrite it in Python/Cython, just like I did for our Liblinear
binding.

I'm not sure what to do with the training code yet, but after looking
at it again, I'm more and more enclined to go with Mathieu's
suggestion of maintaining our own version. The second thing on my list
would be to check how much of the code can go away.
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
James Bergstra
2012-09-27 19:21:19 UTC
Permalink
Post by Lars Buitinck
Post by Doug Coleman
1) scikit's libsvm checkin is currently version 300. The last release
was in April and the version is 312. Are there plans to use the newer
version? The svm_node struct changed, so it's not as trivial as
dropping in the files.
What are the major benefits?
Post by Doug Coleman
2) libsvm comes with a ctypes binding. What if scikit contributed a
cython binding that was in the libsvm project? Then scikit could just
use the libsvm cython module for implementing fit(), predict(), etc.
That would mean that any changes to this Cython binding would have to
be contributed upstream, as you already suggested.
Right, but just so we're clear, there are different levels of
upstream? If sklearn maintains a modified version of libsvm, then
"contributing upstream" is simply a matter of committing to this
modified branch. There is a further-upstream branch (author's
official version) that none of us controls, which has it's own release
cycle, but which in principle may change significantly, and change for
the better in directions that we will want to include.
Post by Lars Buitinck
Post by Doug Coleman
4) I opened another issue on my github of some compiler warnings from
the clang++ compiler. It turns out that there are a lot of calls to
malloc where the return pointer is unchecked. So basically the library
can crash at any time. Someone already offered to make a nontrivial
patch using std::vector and new to fix it. How do we want to proceed?
What's the strategy to merge the change with libsvm and the scikit
project?
https://github.com/erg/libsvm/issues/1
That was me :)
I'm quite swamped ATM, but I did intend to refactor our LibSVM
bindings sometime in the near future (preferably before the next
release). I intend to decouple all the prediction code from LibSVM and
rewrite it in Python/Cython, just like I did for our Liblinear
binding.
I'm not sure what to do with the training code yet, but after looking
at it again, I'm more and more enclined to go with Mathieu's
suggestion of maintaining our own version. The second thing on my list
would be to check how much of the code can go away.
Why do you want to rewrite the predict code, which seems to be already working?
(Doesn't this further divergence from the libsvm code base just
increase the sklearn maintenance burden?)

The key thing seems to be how heavily patched is the svm.cpp already?
If it's completely rewritten, then trying to work with the original
project is silly, but I don't think it is. It seems like there are a
few things:

(1) the use of PREFIX and the _DENSE_REP ifdef, and the extra
double-include file that drives that mechanism

(2) changing the upper_bound in solution_info to a buffer of len 2
instead of 2 different variables

(3) what looks like algorithmic changes around line 1600 that I don't understand

I could certainly be wrong, but these things still look maintainable
as a patch set. Why do you want to break further away from the libsvm
trunk, rather than refactor things to be, if anything, *more*
compatible with it?
Doug Coleman
2012-09-27 19:42:53 UTC
Permalink
On Thu, Sep 27, 2012 at 12:21 PM, James Bergstra
Post by Lars Buitinck
Post by Doug Coleman
1) scikit's libsvm checkin is currently version 300. The last release
was in April and the version is 312. Are there plans to use the newer
version? The svm_node struct changed, so it's not as trivial as
dropping in the files.
What are the major benefits?
I don't know the benefits as I've only used libsvm through scikit til
this version. But it's probably better to do incremental merges of the
authors' changes at every version so that we find more bugs and get
the correctness/optimizations/features that they intended.

Doug
Lars Buitinck
2012-09-27 19:48:25 UTC
Permalink
Post by James Bergstra
Right, but just so we're clear, there are different levels of
upstream? If sklearn maintains a modified version of libsvm, then
"contributing upstream" is simply a matter of committing to this
modified branch. There is a further-upstream branch (author's
official version) that none of us controls, which has it's own release
cycle, but which in principle may change significantly, and change for
the better in directions that we will want to include.
By upstream I mean Lin et al.
Post by James Bergstra
Why do you want to rewrite the predict code, which seems to be already working?
(Doesn't this further divergence from the libsvm code base just
increase the sklearn maintenance burden?)
Because maintaining the scikit-learn wrappers for the prediction code
for Liblinear turned out to be more work than rewriting it, and I
suspect the same will be true for LibSVM. Additional benefits would be
faster compiles and smaller library images.
Post by James Bergstra
The key thing seems to be how heavily patched is the svm.cpp already?
If it's completely rewritten, then trying to work with the original
project is silly, but I don't think it is. It seems like there are a
(1) the use of PREFIX and the _DENSE_REP ifdef, and the extra
double-include file that drives that mechanism
(2) changing the upper_bound in solution_info to a buffer of len 2
instead of 2 different variables
(3) what looks like algorithmic changes around line 1600 that I don't understand
I could certainly be wrong, but these things still look maintainable
as a patch set. Why do you want to break further away from the libsvm
trunk, rather than refactor things to be, if anything, *more*
compatible with it?
As I said, this is something I have not yet started serious work on,
so I haven't made a decision yet. I'll try fixing things in Doug's
version, and I'll see how far I get with that approach.
--
Lars Buitinck
Scientific programmer, ILPS
University of Amsterdam
Andreas Mueller
2012-09-30 12:28:51 UTC
Permalink
Post by James Bergstra
Why do you want to rewrite the predict code, which seems to be already working?
(Doesn't this further divergence from the libsvm code base just
increase the sklearn maintenance burden?)
The key thing seems to be how heavily patched is the svm.cpp already?
If it's completely rewritten, then trying to work with the original
project is silly, but I don't think it is. It seems like there are a
(1) the use of PREFIX and the _DENSE_REP ifdef, and the extra
double-include file that drives that mechanism
(2) changing the upper_bound in solution_info to a buffer of len 2
instead of 2 different variables
(3) what looks like algorithmic changes around line 1600 that I don't understand
I could certainly be wrong, but these things still look maintainable
as a patch set. Why do you want to break further away from the libsvm
trunk, rather than refactor things to be, if anything, *more*
compatible with it?
I think the idea is that a lot of the code could be made more accessible
and shorter by rewriting it using cython and numpy.
James Bergstra
2012-10-04 19:41:09 UTC
Permalink
On Sun, Sep 30, 2012 at 8:28 AM, Andreas Mueller
Post by Andreas Mueller
Post by James Bergstra
Why do you want to rewrite the predict code, which seems to be already working?
(Doesn't this further divergence from the libsvm code base just
increase the sklearn maintenance burden?)
The key thing seems to be how heavily patched is the svm.cpp already?
If it's completely rewritten, then trying to work with the original
project is silly, but I don't think it is. It seems like there are a
(1) the use of PREFIX and the _DENSE_REP ifdef, and the extra
double-include file that drives that mechanism
(2) changing the upper_bound in solution_info to a buffer of len 2
instead of 2 different variables
(3) what looks like algorithmic changes around line 1600 that I don't understand
I could certainly be wrong, but these things still look maintainable
as a patch set. Why do you want to break further away from the libsvm
trunk, rather than refactor things to be, if anything, *more*
compatible with it?
I think the idea is that a lot of the code could be made more accessible
and shorter by rewriting it using cython and numpy.
I can appreciate that, but let's come back to the original question --
which was what to do with this PR? There are at least two top-level
questions that have come up here:

1. whether to merge my PR [roughly as-is] into sklearn

2. whether to create a libsvm fork that has sklearn's nice features
built into it.
2. b) If the answer here is yes, then how should sklearn be
re-factored to use that fork.

Let's come back to (2) later when someone has the energy for it
regardless of the choice regarding 1. My PR does not represent a
significant divergence from the libsvm source compared to what has
already been done. So can we come back to (1)?

What are people's thoughts on my PR? It passes tests and respects code
conventions AFAIK. I've addressed the point regarding the warning vs.
exception.

https://github.com/scikit-learn/scikit-learn/pull/1184

- James
Joseph Turian
2012-10-04 20:15:22 UTC
Permalink
What happened when you contacted the libsvm people?
Post by James Bergstra
On Sun, Sep 30, 2012 at 8:28 AM, Andreas Mueller
Post by Andreas Mueller
Post by James Bergstra
Why do you want to rewrite the predict code, which seems to be already working?
(Doesn't this further divergence from the libsvm code base just
increase the sklearn maintenance burden?)
The key thing seems to be how heavily patched is the svm.cpp already?
If it's completely rewritten, then trying to work with the original
project is silly, but I don't think it is. It seems like there are a
(1) the use of PREFIX and the _DENSE_REP ifdef, and the extra
double-include file that drives that mechanism
(2) changing the upper_bound in solution_info to a buffer of len 2
instead of 2 different variables
(3) what looks like algorithmic changes around line 1600 that I don't understand
I could certainly be wrong, but these things still look maintainable
as a patch set. Why do you want to break further away from the libsvm
trunk, rather than refactor things to be, if anything, *more*
compatible with it?
I think the idea is that a lot of the code could be made more accessible
and shorter by rewriting it using cython and numpy.
I can appreciate that, but let's come back to the original question --
which was what to do with this PR? There are at least two top-level
1. whether to merge my PR [roughly as-is] into sklearn
2. whether to create a libsvm fork that has sklearn's nice features
built into it.
2. b) If the answer here is yes, then how should sklearn be
re-factored to use that fork.
Let's come back to (2) later when someone has the energy for it
regardless of the choice regarding 1. My PR does not represent a
significant divergence from the libsvm source compared to what has
already been done. So can we come back to (1)?
What are people's thoughts on my PR? It passes tests and respects code
conventions AFAIK. I've addressed the point regarding the warning vs.
exception.
https://github.com/scikit-learn/scikit-learn/pull/1184
- James
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Joseph Turian, Ph.D. | President, MetaOptimize
"Optimize Profits. Optimize Engagement."
http://metaoptimize.com
855-ALL-DATA

The web's most active forum for data scientists: http://metaoptimize.com/qa/
James Bergstra
2012-10-04 20:48:22 UTC
Permalink
Nothing so far.
Post by Joseph Turian
What happened when you contacted the libsvm people?
Post by James Bergstra
On Sun, Sep 30, 2012 at 8:28 AM, Andreas Mueller
Post by Andreas Mueller
Post by James Bergstra
Why do you want to rewrite the predict code, which seems to be already working?
(Doesn't this further divergence from the libsvm code base just
increase the sklearn maintenance burden?)
The key thing seems to be how heavily patched is the svm.cpp already?
If it's completely rewritten, then trying to work with the original
project is silly, but I don't think it is. It seems like there are a
(1) the use of PREFIX and the _DENSE_REP ifdef, and the extra
double-include file that drives that mechanism
(2) changing the upper_bound in solution_info to a buffer of len 2
instead of 2 different variables
(3) what looks like algorithmic changes around line 1600 that I don't understand
I could certainly be wrong, but these things still look maintainable
as a patch set. Why do you want to break further away from the libsvm
trunk, rather than refactor things to be, if anything, *more*
compatible with it?
I think the idea is that a lot of the code could be made more accessible
and shorter by rewriting it using cython and numpy.
I can appreciate that, but let's come back to the original question --
which was what to do with this PR? There are at least two top-level
1. whether to merge my PR [roughly as-is] into sklearn
2. whether to create a libsvm fork that has sklearn's nice features
built into it.
2. b) If the answer here is yes, then how should sklearn be
re-factored to use that fork.
Let's come back to (2) later when someone has the energy for it
regardless of the choice regarding 1. My PR does not represent a
significant divergence from the libsvm source compared to what has
already been done. So can we come back to (1)?
What are people's thoughts on my PR? It passes tests and respects code
conventions AFAIK. I've addressed the point regarding the warning vs.
exception.
https://github.com/scikit-learn/scikit-learn/pull/1184
- James
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Joseph Turian, Ph.D. | President, MetaOptimize
"Optimize Profits. Optimize Engagement."
http://metaoptimize.com
855-ALL-DATA
The web's most active forum for data scientists: http://metaoptimize.com/qa/
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Joseph Turian
2012-10-05 04:04:13 UTC
Permalink
Well naturally, the most hilarious solution to this is to fork into a
project called libsvm2.
Post by James Bergstra
Nothing so far.
Post by Joseph Turian
What happened when you contacted the libsvm people?
Post by James Bergstra
On Sun, Sep 30, 2012 at 8:28 AM, Andreas Mueller
Post by Andreas Mueller
Post by James Bergstra
Why do you want to rewrite the predict code, which seems to be already working?
(Doesn't this further divergence from the libsvm code base just
increase the sklearn maintenance burden?)
The key thing seems to be how heavily patched is the svm.cpp already?
If it's completely rewritten, then trying to work with the original
project is silly, but I don't think it is. It seems like there are a
(1) the use of PREFIX and the _DENSE_REP ifdef, and the extra
double-include file that drives that mechanism
(2) changing the upper_bound in solution_info to a buffer of len 2
instead of 2 different variables
(3) what looks like algorithmic changes around line 1600 that I don't understand
I could certainly be wrong, but these things still look maintainable
as a patch set. Why do you want to break further away from the libsvm
trunk, rather than refactor things to be, if anything, *more*
compatible with it?
I think the idea is that a lot of the code could be made more accessible
and shorter by rewriting it using cython and numpy.
I can appreciate that, but let's come back to the original question --
which was what to do with this PR? There are at least two top-level
1. whether to merge my PR [roughly as-is] into sklearn
2. whether to create a libsvm fork that has sklearn's nice features
built into it.
2. b) If the answer here is yes, then how should sklearn be
re-factored to use that fork.
Let's come back to (2) later when someone has the energy for it
regardless of the choice regarding 1. My PR does not represent a
significant divergence from the libsvm source compared to what has
already been done. So can we come back to (1)?
What are people's thoughts on my PR? It passes tests and respects code
conventions AFAIK. I've addressed the point regarding the warning vs.
exception.
https://github.com/scikit-learn/scikit-learn/pull/1184
- James
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Joseph Turian, Ph.D. | President, MetaOptimize
"Optimize Profits. Optimize Engagement."
http://metaoptimize.com
855-ALL-DATA
The web's most active forum for data scientists: http://metaoptimize.com/qa/
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Don't let slow site performance ruin your business. Deploy New Relic APM
Deploy New Relic app performance management and know exactly
what is happening inside your Ruby, Python, PHP, Java, and .NET app
Try New Relic at no cost today and get our sweet Data Nerd shirt too!
http://p.sf.net/sfu/newrelic-dev2dev
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Joseph Turian, Ph.D. | President, MetaOptimize
"Optimize Profits. Optimize Engagement."
http://metaoptimize.com
855-ALL-DATA

The web's most active forum for data scientists: http://metaoptimize.com/qa/
Andreas Mueller
2012-10-05 10:52:09 UTC
Permalink
Post by James Bergstra
Nothing so far.
I'm +1 on merge.
I agree with James, in the greater scheme that PR doesn't really add
much to the divergence
and directly improves usability.
Gael Varoquaux
2012-10-05 12:10:51 UTC
Permalink
Post by Andreas Mueller
I agree with James, in the greater scheme that PR doesn't really add
much to the divergence and directly improves usability.
I agree.
Olivier Grisel
2012-10-05 15:54:45 UTC
Permalink
Post by Gael Varoquaux
Post by Andreas Mueller
I agree with James, in the greater scheme that PR doesn't really add
much to the divergence and directly improves usability.
I agree.
+1 too. The error message when reaching max_iter could be to try to
standardize the data with StandardScaler or MinMaxScaler as this might
be the cause of the lack of convergence.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
James Bergstra
2012-10-05 15:57:40 UTC
Permalink
On Fri, Oct 5, 2012 at 11:54 AM, Olivier Grisel
Post by Olivier Grisel
Post by Gael Varoquaux
Post by Andreas Mueller
I agree with James, in the greater scheme that PR doesn't really add
much to the divergence and directly improves usability.
I agree.
+1 too. The error message when reaching max_iter could be to try to
standardize the data with StandardScaler or MinMaxScaler as this might
be the cause of the lack of convergence.
I think you're right, that's a good suggestion.
James Bergstra
2012-09-27 18:45:36 UTC
Permalink
Post by Doug Coleman
Hi everyone,
I'm trying to figure out the best way to proceed. Here are some things
I noticed.
1) scikit's libsvm checkin is currently version 300. The last release
was in April and the version is 312. Are there plans to use the newer
version? The svm_node struct changed, so it's not as trivial as
dropping in the files.
I didn't see a version archive on the official libsvm page, but google
found this:
http://libsvm.sourcearchive.com/

It has source archives for a version it calls "3.0-1", as well as 3
other versions since then. I haven't looked at the diffs, but I was
thinking this might help to separate out the changes that have already
been made on the sklearn fork.

- James
James Bergstra
2012-09-26 21:47:55 UTC
Permalink
Hi Chih-Jen Lin (as well as the scikit-learn mailing list)

I've pushed a small change to libsvm today to sklearn
(https://github.com/scikit-learn/scikit-learn/pull/1184) where a copy
of the libsvm source is mirrored in sklearn's git project. We were
wondering how to proceed. We do not want to diverge further than
necessary from the official libsvm trunk, but this seems to be just
one among a few features that sklearn developers have implemented.
Divergence seems like something to manage rather than to avoid.

Do you have a github account? If not, would you mind it if we set up
(on your behalf, if you would prefer not to be on github yourself) a
"libsvm" project on github that simply tracks your official version?
This would have two advantages:

1. the changes made for sklearn could be maintained as a separate
branch, and quickly rebased to your new releases and

2. You can easily cherry-pick the changes you would like to include in
the master branch (the normal github niceness)

This "libsvm" project could be transferred to your account if you
decide to set one up later. Your perspective on the matter would be
much appreciated!

Thanks,

- James Bergstra

On Wed, Sep 26, 2012 at 5:30 PM, Gael Varoquaux
Post by Gael Varoquaux
Hey Joseph,
Fair enough with regards to your points about a fork being considered as
aggressive. Thanks a lot raising this point. I guess that I was more
thinking of fork in terms of version control rather than in terms of
creating a parallel project. I have grown used to fork being useful
things :).
Also, I am worried that a collection of patches will bitrot, and my
immediate thought is to put them in a git, just to make sure that history
is conserved.
Of course, my thoughts reflect very much that I am used to working with
open version-control project and distributed version control, which is
not the case here.
Anyhow, both situations, the collection of patches and the fork, are
suboptimal situations.
Thanks for your thoughts,
Gaël
Post by Joseph Turian
Post by Gael Varoquaux
Post by Joseph Turian
If sklearn will be maintaining a patch set against libsvm, this patch set should be available to non sklearn users too.
I reckon you are volonteering to maintain a fork of libsvm? That's very
good news, the community definitely needs this badly.
I was considering the idea of a fork, but I think it might be premature.
Everyone seems to agree that it would be great if the patches merged upstream.
If the maintainers have been unwilling to merge patches, then they
might not merge the patches this time around.
If you fork the project, that might be taken as an aggressive move and
they will be unwilling to work together with the maintainers of the
fork.
My thought was that releasing a patch set, but not actively
"We'd like to see our patch set merged into libsvm core code", then
there is some pressure on libsvm to merge the patches but not an
aggressive amount (like forking).
Just my thoughts.
Joseph
------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Gael Varoquaux
Researcher, INRIA Parietal
Laboratoire de Neuro-Imagerie Assistee par Ordinateur
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone: ++ 33-1-69-08-79-68
http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2012-09-26 21:55:05 UTC
Permalink
Much appreciated James :)
Joseph Turian
2012-09-27 04:57:33 UTC
Permalink
Well stated.

On Wed, Sep 26, 2012 at 2:47 PM, James Bergstra
Post by James Bergstra
Hi Chih-Jen Lin (as well as the scikit-learn mailing list)
I've pushed a small change to libsvm today to sklearn
(https://github.com/scikit-learn/scikit-learn/pull/1184) where a copy
of the libsvm source is mirrored in sklearn's git project. We were
wondering how to proceed. We do not want to diverge further than
necessary from the official libsvm trunk, but this seems to be just
one among a few features that sklearn developers have implemented.
Divergence seems like something to manage rather than to avoid.
Do you have a github account? If not, would you mind it if we set up
(on your behalf, if you would prefer not to be on github yourself) a
"libsvm" project on github that simply tracks your official version?
1. the changes made for sklearn could be maintained as a separate
branch, and quickly rebased to your new releases and
2. You can easily cherry-pick the changes you would like to include in
the master branch (the normal github niceness)
This "libsvm" project could be transferred to your account if you
decide to set one up later. Your perspective on the matter would be
much appreciated!
Thanks,
- James Bergstra
On Wed, Sep 26, 2012 at 5:30 PM, Gael Varoquaux
Post by Gael Varoquaux
Hey Joseph,
Fair enough with regards to your points about a fork being considered as
aggressive. Thanks a lot raising this point. I guess that I was more
thinking of fork in terms of version control rather than in terms of
creating a parallel project. I have grown used to fork being useful
things :).
Also, I am worried that a collection of patches will bitrot, and my
immediate thought is to put them in a git, just to make sure that history
is conserved.
Of course, my thoughts reflect very much that I am used to working with
open version-control project and distributed version control, which is
not the case here.
Anyhow, both situations, the collection of patches and the fork, are
suboptimal situations.
Thanks for your thoughts,
Gaël
Post by Joseph Turian
Post by Gael Varoquaux
Post by Joseph Turian
If sklearn will be maintaining a patch set against libsvm, this patch set should be available to non sklearn users too.
I reckon you are volonteering to maintain a fork of libsvm? That's very
good news, the community definitely needs this badly.
I was considering the idea of a fork, but I think it might be premature.
Everyone seems to agree that it would be great if the patches merged upstream.
If the maintainers have been unwilling to merge patches, then they
might not merge the patches this time around.
If you fork the project, that might be taken as an aggressive move and
they will be unwilling to work together with the maintainers of the
fork.
My thought was that releasing a patch set, but not actively
"We'd like to see our patch set merged into libsvm core code", then
there is some pressure on libsvm to merge the patches but not an
aggressive amount (like forking).
Just my thoughts.
Joseph
------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Gael Varoquaux
Researcher, INRIA Parietal
Laboratoire de Neuro-Imagerie Assistee par Ordinateur
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone: ++ 33-1-69-08-79-68
http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Joseph Turian, Ph.D. | President, MetaOptimize
"Optimize Profits. Optimize Engagement."
http://metaoptimize.com
855-ALL-DATA

The web's most active forum for data scientists: http://metaoptimize.com/qa/
Gael Varoquaux
2012-09-27 05:29:54 UTC
Permalink
Indeed, thanks!

Gael
Post by Joseph Turian
Well stated.
On Wed, Sep 26, 2012 at 2:47 PM, James Bergstra
Post by James Bergstra
Hi Chih-Jen Lin (as well as the scikit-learn mailing list)
I've pushed a small change to libsvm today to sklearn
(https://github.com/scikit-learn/scikit-learn/pull/1184) where a copy
of the libsvm source is mirrored in sklearn's git project. We were
wondering how to proceed. We do not want to diverge further than
necessary from the official libsvm trunk, but this seems to be just
one among a few features that sklearn developers have implemented.
Divergence seems like something to manage rather than to avoid.
Do you have a github account? If not, would you mind it if we set up
(on your behalf, if you would prefer not to be on github yourself) a
"libsvm" project on github that simply tracks your official version?
1. the changes made for sklearn could be maintained as a separate
branch, and quickly rebased to your new releases and
2. You can easily cherry-pick the changes you would like to include in
the master branch (the normal github niceness)
This "libsvm" project could be transferred to your account if you
decide to set one up later. Your perspective on the matter would be
much appreciated!
Thanks,
- James Bergstra
On Wed, Sep 26, 2012 at 5:30 PM, Gael Varoquaux
Post by Gael Varoquaux
Hey Joseph,
Fair enough with regards to your points about a fork being considered as
aggressive. Thanks a lot raising this point. I guess that I was more
thinking of fork in terms of version control rather than in terms of
creating a parallel project. I have grown used to fork being useful
things :).
Also, I am worried that a collection of patches will bitrot, and my
immediate thought is to put them in a git, just to make sure that history
is conserved.
Of course, my thoughts reflect very much that I am used to working with
open version-control project and distributed version control, which is
not the case here.
Anyhow, both situations, the collection of patches and the fork, are
suboptimal situations.
Thanks for your thoughts,
Gaël
Post by Joseph Turian
Post by Gael Varoquaux
Post by Joseph Turian
If sklearn will be maintaining a patch set against libsvm, this patch set should be available to non sklearn users too.
I reckon you are volonteering to maintain a fork of libsvm? That's very
good news, the community definitely needs this badly.
I was considering the idea of a fork, but I think it might be premature.
Everyone seems to agree that it would be great if the patches merged upstream.
If the maintainers have been unwilling to merge patches, then they
might not merge the patches this time around.
If you fork the project, that might be taken as an aggressive move and
they will be unwilling to work together with the maintainers of the
fork.
My thought was that releasing a patch set, but not actively
"We'd like to see our patch set merged into libsvm core code", then
there is some pressure on libsvm to merge the patches but not an
aggressive amount (like forking).
Just my thoughts.
Joseph
------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Gael Varoquaux
Researcher, INRIA Parietal
Laboratoire de Neuro-Imagerie Assistee par Ordinateur
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone: ++ 33-1-69-08-79-68
http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Gael Varoquaux
Researcher, INRIA Parietal
Laboratoire de Neuro-Imagerie Assistee par Ordinateur
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone: ++ 33-1-69-08-79-68
http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
James Bergstra
2013-12-17 13:26:42 UTC
Permalink
News: Chih-Jen Lin wrote to me a few days ago to let me know that he has
created a github project for libsvm.

https://github.com/cjlin1/libsvm

Anyone want to try a rebase (!?)
Olivier Grisel
2013-12-17 13:40:34 UTC
Permalink
Post by James Bergstra
News: Chih-Jen Lin wrote to me a few days ago to let me know that he has
created a github project for libsvm.
https://github.com/cjlin1/libsvm
\o/
Post by James Bergstra
Anyone want to try a rebase (!?)
That would indeed be great to maintain a fork with our patches in a
regularly rebased branch.

But don't count on me to do it :)
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Joel Nothman
2013-12-17 21:05:08 UTC
Permalink
On Wed, Dec 18, 2013 at 12:40 AM, Olivier Grisel
Post by James Bergstra
News: Chih-Jen Lin wrote to me a few days ago to let me know that he has
created a github project for libsvm.
https://github.com/cjlin1/libsvm
\o/
Wow. A first step for the community should be to tag commits that appear to
be identical (near-identical) to releases, so that a rebase point for each
of the myriad forks around the place can be identified. Fortunately, the
sample weights package seems to be tracking the latest release.

When can we expect to see liblinear?

Frédéric Bastien
2012-09-26 19:53:17 UTC
Permalink
On Wed, Sep 26, 2012 at 3:49 PM, Andreas Mueller
Post by Andreas Mueller
Hi James.
Thanks for the PR.
I thinks so far we avoided changing LibSVM and tried to get patches
in upstream. Afaik, this hasn't succeeded so far.
The cases I am thinking of is me trying to get the chi2 kernel in and
Lars cleaning up some of the code.
As LibSVM seems to be very conservative wrt. features and patches,
I think we should reconsider our policy and accept your modifications.
I would still suggest trying to get it upstream in case it work this time :)
Post by Andreas Mueller
Can you give some insights into why this check is necessary and in
what kind of situations LibSVM fails to converge? I guess it uses
the duality gap for convergence. Is is the case that this is not
a good measure sometimes?
Cheers,
Andy
------------------------------------------------------------------------------
How fast is your code?
3 out of 4 devs don\\\'t know how their code performs in production.
Find out how slow your code is with AppDynamics Lite.
http://ad.doubleclick.net/clk;262219672;13503038;z?
http://info.appdynamics.com/FreeJavaPerformanceDownload.html
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Gael Varoquaux
2012-09-26 20:25:07 UTC
Permalink
Post by Frédéric Bastien
I would still suggest trying to get it upstream in case it work this time :)
+1. I guess the policy should be to try to get it upstream, and if it
fails, merge it in sklearn.

Thanks a lot, James!

Gaël
Olivier Grisel
2012-09-28 13:17:55 UTC
Permalink
Post by Andreas Mueller
Can you give some insights into why this check is necessary and in
what kind of situations LibSVM fails to converge? I guess it uses
the duality gap for convergence. Is is the case that this is not
a good measure sometimes?
I guess this user on stackoverflow was facing a similar issue:

http://stackoverflow.com/questions/12616492/scikit-learns-gridsearchcv-with-linear-kernel-svm-takes-too-long

However he/she did not provide the data to reproduce the problem.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Continue reading on narkive:
Loading...