Discussion:
GSoC2015 topics
(too old to reply)
Christof Angermueller
2015-02-04 22:14:42 UTC
Permalink
Hi all,

is there already a list of potential Google Summer of Code (GSoC) 2015
projects?
Knowing about potential projects would allow me start working on certain
ideas early.

Cheers,
Christof

--
Christof Angermueller
***@gmail.com
http://cangermueller.com
Andy
2015-02-05 11:52:29 UTC
Permalink
Hi Christof.
Good question. I don't think we came up with a list yet.
I just looked at the list from last year, and what seems most relevant
still is GMMs,
and possibly the coordinate descent solvers (Alex maybe you can say what
is left there or
if with the SAG we are happy now?)
There is still some deep learning stuff that we might want to include,
but we need to merge
the MLP first.
I think it would also be interesting to rework the Gaussian processes,
but that might be a bit to ambitious for a GSOC project.

If anyone has any other ideas, maybe list them in this thread. Also,
possible mentors, please speak up :)

Cheers,
Andreas

On 02/04/2015 11:14 PM, Christof Angermueller wrote:
> Hi all,
>
> is there already a list of potential Google Summer of Code (GSoC) 2015
> projects?
> Knowing about potential projects would allow me start working on certain
> ideas early.
>
> Cheers,
> Christof
>
Daniel Sullivan
2015-02-05 12:03:29 UTC
Permalink
I'm still in the process of polishing up SAG, hopefully I can get something
commit-able soon

On Thu, Feb 5, 2015 at 12:52 PM, Andy <***@gmail.com> wrote:

> Hi Christof.
> Good question. I don't think we came up with a list yet.
> I just looked at the list from last year, and what seems most relevant
> still is GMMs,
> and possibly the coordinate descent solvers (Alex maybe you can say what
> is left there or
> if with the SAG we are happy now?)
> There is still some deep learning stuff that we might want to include,
> but we need to merge
> the MLP first.
> I think it would also be interesting to rework the Gaussian processes,
> but that might be a bit to ambitious for a GSOC project.
>
> If anyone has any other ideas, maybe list them in this thread. Also,
> possible mentors, please speak up :)
>
> Cheers,
> Andreas
>
> On 02/04/2015 11:14 PM, Christof Angermueller wrote:
> > Hi all,
> >
> > is there already a list of potential Google Summer of Code (GSoC) 2015
> > projects?
> > Knowing about potential projects would allow me start working on certain
> > ideas early.
> >
> > Cheers,
> > Christof
> >
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Andy
2015-02-05 12:18:55 UTC
Permalink
On 02/05/2015 01:03 PM, Daniel Sullivan wrote:
> I'm still in the process of polishing up SAG, hopefully I can get
> something commit-able soon
Sure, no hurry.
My question was more "Do we want anything more that is not covered by
your work on SAG" ;)
Lee Zamparo
2015-02-05 15:38:55 UTC
Permalink
With respect to Gaussian processes, there are some good packages in
python already (https://github.com/SheffieldML/GPy,
https://github.com/dfm/george, probably others). In particular, GPy
does not require any other dependencies over and above those already
required by sklearn.

Maybe a reasonable project would be to wrap a subset of GPy with a
sklearn compliant interface? I'm not sure how much work this would
be, though.

L.

On Thu, Feb 5, 2015 at 6:52 AM, Andy <***@gmail.com> wrote:
> Hi Christof.
> Good question. I don't think we came up with a list yet.
> I just looked at the list from last year, and what seems most relevant
> still is GMMs,
> and possibly the coordinate descent solvers (Alex maybe you can say what
> is left there or
> if with the SAG we are happy now?)
> There is still some deep learning stuff that we might want to include,
> but we need to merge
> the MLP first.
> I think it would also be interesting to rework the Gaussian processes,
> but that might be a bit to ambitious for a GSOC project.
>
> If anyone has any other ideas, maybe list them in this thread. Also,
> possible mentors, please speak up :)
>
> Cheers,
> Andreas
>
> On 02/04/2015 11:14 PM, Christof Angermueller wrote:
>> Hi all,
>>
>> is there already a list of potential Google Summer of Code (GSoC) 2015
>> projects?
>> Knowing about potential projects would allow me start working on certain
>> ideas early.
>>
>> Cheers,
>> Christof
>>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Kyle Kastner
2015-02-05 15:51:27 UTC
Permalink
I think most of the GP related work is deciding what the sklearn compatible
interface should be :) specifically how to handle kernels and try to share
with core codebase.

The HODLR solver of George could be very nice for scalibility but algorithm
is not easy. There are a few other options on that front but all are semi
tricky from what I can tell.

Getting GP stuff really nailed will be a good step towards Bayesian
hyperparameter optimization (or one type) which would be a really killer
feature if done and integrated well. But a whole lot of work and random
search is surprisingly good.

W.r.t deep learning what would be added? Gaussian RBM might be nice to have.

Kyle
On Feb 5, 2015 10:40 AM, "Lee Zamparo" <***@gmail.com> wrote:

> With respect to Gaussian processes, there are some good packages in
> python already (https://github.com/SheffieldML/GPy,
> https://github.com/dfm/george, probably others). In particular, GPy
> does not require any other dependencies over and above those already
> required by sklearn.
>
> Maybe a reasonable project would be to wrap a subset of GPy with a
> sklearn compliant interface? I'm not sure how much work this would
> be, though.
>
> L.
>
> On Thu, Feb 5, 2015 at 6:52 AM, Andy <***@gmail.com> wrote:
> > Hi Christof.
> > Good question. I don't think we came up with a list yet.
> > I just looked at the list from last year, and what seems most relevant
> > still is GMMs,
> > and possibly the coordinate descent solvers (Alex maybe you can say what
> > is left there or
> > if with the SAG we are happy now?)
> > There is still some deep learning stuff that we might want to include,
> > but we need to merge
> > the MLP first.
> > I think it would also be interesting to rework the Gaussian processes,
> > but that might be a bit to ambitious for a GSOC project.
> >
> > If anyone has any other ideas, maybe list them in this thread. Also,
> > possible mentors, please speak up :)
> >
> > Cheers,
> > Andreas
> >
> > On 02/04/2015 11:14 PM, Christof Angermueller wrote:
> >> Hi all,
> >>
> >> is there already a list of potential Google Summer of Code (GSoC) 2015
> >> projects?
> >> Knowing about potential projects would allow me start working on certain
> >> ideas early.
> >>
> >> Cheers,
> >> Christof
> >>
> >
> >
> ------------------------------------------------------------------------------
> > Dive into the World of Parallel Programming. The Go Parallel Website,
> > sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> > hub for all things parallel software development, from weekly thought
> > leadership blogs to news, videos, case studies, tutorials and more. Take
> a
> > look and join the conversation now. http://goparallel.sourceforge.net/
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-***@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Thomas Johnson
2015-02-05 15:56:12 UTC
Permalink
So I don't really have a 'deep' understanding of deep learning, but aren't
things like Gaussian RBMs becoming obsolete? I thought I read that Hinton
said that the current state-of-the-art is Really Big networks that just use
standard backprop (plus tricks like dropout). Is that not correct, or is
Hinton's opinion not representative of the current best practices?


On Thu Feb 05 2015 at 9:51:46 AM Kyle Kastner <***@gmail.com> wrote:

> I think most of the GP related work is deciding what the sklearn
> compatible interface should be :) specifically how to handle kernels and
> try to share with core codebase.
>
> The HODLR solver of George could be very nice for scalibility but
> algorithm is not easy. There are a few other options on that front but all
> are semi tricky from what I can tell.
>
> Getting GP stuff really nailed will be a good step towards Bayesian
> hyperparameter optimization (or one type) which would be a really killer
> feature if done and integrated well. But a whole lot of work and random
> search is surprisingly good.
>
> W.r.t deep learning what would be added? Gaussian RBM might be nice to
> have.
>
> Kyle
> On Feb 5, 2015 10:40 AM, "Lee Zamparo" <***@gmail.com> wrote:
>
>> With respect to Gaussian processes, there are some good packages in
>> python already (https://github.com/SheffieldML/GPy,
>> https://github.com/dfm/george, probably others). In particular, GPy
>> does not require any other dependencies over and above those already
>> required by sklearn.
>>
>> Maybe a reasonable project would be to wrap a subset of GPy with a
>> sklearn compliant interface? I'm not sure how much work this would
>> be, though.
>>
>> L.
>>
>> On Thu, Feb 5, 2015 at 6:52 AM, Andy <***@gmail.com> wrote:
>> > Hi Christof.
>> > Good question. I don't think we came up with a list yet.
>> > I just looked at the list from last year, and what seems most relevant
>> > still is GMMs,
>> > and possibly the coordinate descent solvers (Alex maybe you can say what
>> > is left there or
>> > if with the SAG we are happy now?)
>> > There is still some deep learning stuff that we might want to include,
>> > but we need to merge
>> > the MLP first.
>> > I think it would also be interesting to rework the Gaussian processes,
>> > but that might be a bit to ambitious for a GSOC project.
>> >
>> > If anyone has any other ideas, maybe list them in this thread. Also,
>> > possible mentors, please speak up :)
>> >
>> > Cheers,
>> > Andreas
>> >
>> > On 02/04/2015 11:14 PM, Christof Angermueller wrote:
>> >> Hi all,
>> >>
>> >> is there already a list of potential Google Summer of Code (GSoC) 2015
>> >> projects?
>> >> Knowing about potential projects would allow me start working on
>> certain
>> >> ideas early.
>> >>
>> >> Cheers,
>> >> Christof
>> >>
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Dive into the World of Parallel Programming. The Go Parallel Website,
>> > sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> > hub for all things parallel software development, from weekly thought
>> > leadership blogs to news, videos, case studies, tutorials and more.
>> Take a
>> > look and join the conversation now. http://goparallel.sourceforge.net/
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > Scikit-learn-***@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
> ------------------------------------------------------------
> ------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Gael Varoquaux
2015-02-05 15:58:07 UTC
Permalink
I have the same feeling.

On Thu, Feb 05, 2015 at 03:56:12PM +0000, Thomas Johnson wrote:
> So I don't really have a 'deep' understanding of deep learning, but aren't
> things like Gaussian RBMs becoming obsolete? I thought I read that Hinton said
> that the current state-of-the-art is Really Big networks that just use standard
> backprop (plus tricks like dropout). Is that not correct, or is Hinton's
> opinion not representative of the current best practices?


> On Thu Feb 05 2015 at 9:51:46 AM Kyle Kastner <***@gmail.com> wrote:


> I think most of the GP related work is deciding what the sklearn compatible
> interface should be :) specifically how to handle kernels and try to share
> with core codebase.

> The HODLR solver of George could be very nice for scalibility but algorithm
> is not easy. There are a few other options on that front but all are semi
> tricky from what I can tell.

> Getting GP stuff really nailed will be a good step towards Bayesian
> hyperparameter optimization (or one type) which would be a really killer
> feature if done and integrated well. But a whole lot of work and random
> search is surprisingly good.

> W.r.t deep learning what would be added? Gaussian RBM might be nice to
> have.

> Kyle

> On Feb 5, 2015 10:40 AM, "Lee Zamparo" <***@gmail.com> wrote:

> With respect to Gaussian processes, there are some good packages in
> python already (https://github.com/SheffieldML/GPy,
> https://github.com/dfm/george, probably others).  In particular, GPy
> does not require any other dependencies over and above those already
> required by sklearn.

> Maybe a reasonable project would be to wrap a subset of GPy with a
> sklearn compliant interface?  I'm not sure how much work this would
> be, though.

> L.

> On Thu, Feb 5, 2015 at 6:52 AM, Andy <***@gmail.com> wrote:
> > Hi Christof.
> > Good question. I don't think we came up with a list yet.
> > I just looked at the list from last year, and what seems most
> relevant
> > still is GMMs,
> > and possibly the coordinate descent solvers (Alex maybe you can say
> what
> > is left there or
> > if with the SAG we are happy now?)
> > There is still some deep learning stuff that we might want to
> include,
> > but we need to merge
> > the MLP first.
> > I think it would also be interesting to rework the Gaussian
> processes,
> > but that might be a bit to ambitious for a GSOC project.

> > If anyone has any other ideas, maybe list them in this thread. Also,
> > possible mentors, please speak up :)

> > Cheers,
> > Andreas

> > On 02/04/2015 11:14 PM, Christof Angermueller wrote:
> >> Hi all,

> >> is there already a list of potential Google Summer of Code (GSoC)
> 2015
> >> projects?
> >> Knowing about potential projects would allow me start working on
> certain
> >> ideas early.

> >> Cheers,
> >> Christof



> ------------------------------------------------------------------------------
> > Dive into the World of Parallel Programming. The Go Parallel Website,
> > sponsored by Intel and developed in partnership with Slashdot Media,
> is your
> > hub for all things parallel software development, from weekly thought
> > leadership blogs to news, videos, case studies, tutorials and more.
> Take a
> > look and join the conversation now. http://goparallel.sourceforge.net
> /
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-***@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more.
> Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/

> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
Gael Varoquaux
Researcher, INRIA Parietal
Laboratoire de Neuro-Imagerie Assistee par Ordinateur
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone: ++ 33-1-69-08-79-68
http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
Alexandre Gramfort
2015-02-05 16:19:45 UTC
Permalink
> I just looked at the list from last year, and what seems most relevant
> still is GMMs,
> and possibly the coordinate descent solvers (Alex maybe you can say what
> is left there or
> if with the SAG we are happy now?)

there is work coming in coordinate descent and SAG is almost done.
I don't think it's worth investing a gsoc on this topic.

Alex
Akshay Narasimha
2015-02-05 19:12:23 UTC
Permalink
Is Online low rank factorisation still a vaild idea for this year? As it
was in the last years idea list.

On Thu, Feb 5, 2015 at 9:49 PM, Alexandre Gramfort <
***@telecom-paristech.fr> wrote:

> > I just looked at the list from last year, and what seems most relevant
> > still is GMMs,
> > and possibly the coordinate descent solvers (Alex maybe you can say what
> > is left there or
> > if with the SAG we are happy now?)
>
> there is work coming in coordinate descent and SAG is almost done.
> I don't think it's worth investing a gsoc on this topic.
>
> Alex
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Kyle Kastner
2015-02-05 19:43:54 UTC
Permalink
IncrementalPCA is done (have to add randomized SVD solver but that should
be simple), but I am sure there are other low rank methods which need a
partial_fit . I think adding partial_fit functions in general to as many
algorithms as possible would be nice

Kyle

On Thu, Feb 5, 2015 at 2:12 PM, Akshay Narasimha <***@gmail.com>
wrote:

> Is Online low rank factorisation still a vaild idea for this year? As it
> was in the last years idea list.
>
> On Thu, Feb 5, 2015 at 9:49 PM, Alexandre Gramfort <
> ***@telecom-paristech.fr> wrote:
>
>> > I just looked at the list from last year, and what seems most relevant
>> > still is GMMs,
>> > and possibly the coordinate descent solvers (Alex maybe you can say what
>> > is left there or
>> > if with the SAG we are happy now?)
>>
>> there is work coming in coordinate descent and SAG is almost done.
>> I don't think it's worth investing a gsoc on this topic.
>>
>> Alex
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
Joel Nothman
2015-02-05 21:52:31 UTC
Permalink
> I think adding partial_fit functions in general to as many algorithms as
possible would be nice

Which could be a project in itself, for someone open to breadth rather than
depth.

On 6 February 2015 at 06:43, Kyle Kastner <***@gmail.com> wrote:

> IncrementalPCA is done (have to add randomized SVD solver but that should
> be simple), but I am sure there are other low rank methods which need a
> partial_fit . I think adding partial_fit functions in general to as many
> algorithms as possible would be nice
>
> Kyle
>
> On Thu, Feb 5, 2015 at 2:12 PM, Akshay Narasimha <***@gmail.com
> > wrote:
>
>> Is Online low rank factorisation still a vaild idea for this year? As it
>> was in the last years idea list.
>>
>> On Thu, Feb 5, 2015 at 9:49 PM, Alexandre Gramfort <
>> ***@telecom-paristech.fr> wrote:
>>
>>> > I just looked at the list from last year, and what seems most relevant
>>> > still is GMMs,
>>> > and possibly the coordinate descent solvers (Alex maybe you can say
>>> what
>>> > is left there or
>>> > if with the SAG we are happy now?)
>>>
>>> there is work coming in coordinate descent and SAG is almost done.
>>> I don't think it's worth investing a gsoc on this topic.
>>>
>>> Alex
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
Akshay Narasimha
2015-02-09 09:00:31 UTC
Permalink
On Fri, Feb 6, 2015 at 3:22 AM, Joel Nothman <***@gmail.com> wrote:

> > I think adding partial_fit functions in general to as many algorithms
> as possible would be nice
>
> Which could be a project in itself, for someone open to breadth rather
> than depth.
> <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
>
> I would like to work on this but would need the community's input on this
first.
Alexandre Gramfort
2015-02-09 10:36:30 UTC
Permalink
FYI I created the wiki page but it needs editing. So it's WIP

https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-(GSOC)-2015

A
Andy
2015-02-10 15:09:51 UTC
Permalink
In particular we need to update the mentors.
Currently we have last years:

""Here are people that have said that they might be available for mentoring:

Gaël Varoquaux, Vlad Niculae, Olivier Grisel, Andreas Mueller, Jason
Rudy, Robert Layton, Alexandre Gramfort, Arnaud Joly, Jaidev Deshpande
(neural net related stuff)."""

Should we just delete all and people that are willing will add
themselves back in?



On 02/09/2015 05:36 AM, Alexandre Gramfort wrote:
> FYI I created the wiki page but it needs editing. So it's WIP
>
> https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-(GSOC)-2015
>
> A
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
ragv ragv
2015-02-09 17:39:31 UTC
Permalink
Hi,

I saw implementing GAMs as one of the suggested topics for GSoC 2015.
Could I take that up? I saw your ( Alex's ) name under that. If yes,
please let me know I'll start working on the same and if you permit me
to, I'll start a wiki page for my proposal and timeline.

Thanks
ragv
Alexandre Gramfort
2015-02-09 21:28:31 UTC
Permalink
please wait a bit so we finalize the list. It's not definitive.

A

On Mon, Feb 9, 2015 at 6:39 PM, ragv ragv <***@gmail.com> wrote:
> Hi,
>
> I saw implementing GAMs as one of the suggested topics for GSoC 2015.
> Could I take that up? I saw your ( Alex's ) name under that. If yes,
> please let me know I'll start working on the same and if you permit me
> to, I'll start a wiki page for my proposal and timeline.
>
> Thanks
> ragv
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Ronnie Ghose
2015-02-09 21:31:18 UTC
Permalink
are we interested in more discriminant methods? There were a few more @
JMLR this year

On Mon, Feb 9, 2015 at 4:28 PM, Alexandre Gramfort <
***@m4x.org> wrote:

> please wait a bit so we finalize the list. It's not definitive.
>
> A
>
> On Mon, Feb 9, 2015 at 6:39 PM, ragv ragv <***@gmail.com> wrote:
> > Hi,
> >
> > I saw implementing GAMs as one of the suggested topics for GSoC 2015.
> > Could I take that up? I saw your ( Alex's ) name under that. If yes,
> > please let me know I'll start working on the same and if you permit me
> > to, I'll start a wiki page for my proposal and timeline.
> >
> > Thanks
> > ragv
> >
> >
> ------------------------------------------------------------------------------
> > Dive into the World of Parallel Programming. The Go Parallel Website,
> > sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> > hub for all things parallel software development, from weekly thought
> > leadership blogs to news, videos, case studies, tutorials and more. Take
> a
> > look and join the conversation now. http://goparallel.sourceforge.net/
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-***@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Alexandre Gramfort
2015-02-09 21:33:54 UTC
Permalink
what do you have in mind?

A
Andy
2015-02-10 15:41:27 UTC
Permalink
I'd say this years JMLR is too fresh ;)


On 02/09/2015 04:31 PM, Ronnie Ghose wrote:
> are we interested in more discriminant methods? There were a few more
> @ JMLR this year
>
> On Mon, Feb 9, 2015 at 4:28 PM, Alexandre Gramfort
> <***@m4x.org <mailto:***@m4x.org>> wrote:
>
> please wait a bit so we finalize the list. It's not definitive.
>
> A
>
> On Mon, Feb 9, 2015 at 6:39 PM, ragv ragv <***@gmail.com
> <mailto:***@gmail.com>> wrote:
> > Hi,
> >
> > I saw implementing GAMs as one of the suggested topics for GSoC
> 2015.
> > Could I take that up? I saw your ( Alex's ) name under that. If yes,
> > please let me know I'll start working on the same and if you
> permit me
> > to, I'll start a wiki page for my proposal and timeline.
> >
> > Thanks
> > ragv
> >
> >
> ------------------------------------------------------------------------------
> > Dive into the World of Parallel Programming. The Go Parallel
> Website,
> > sponsored by Intel and developed in partnership with Slashdot
> Media, is your
> > hub for all things parallel software development, from weekly
> thought
> > leadership blogs to news, videos, case studies, tutorials and
> more. Take a
> > look and join the conversation now.
> http://goparallel.sourceforge.net/
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-***@lists.sourceforge.net
> <mailto:Scikit-learn-***@lists.sourceforge.net>
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot
> Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and
> more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> <mailto:Scikit-learn-***@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Christof Angermueller
2015-02-11 22:24:07 UTC
Permalink
as far as I know, sklearn has only an RBM module, but does not support
multilayer perceptrons (MLPs), autoencoder, or recurrent neural
networks. Are there any plans do extend sklearn by some neural network
related modules?
There was a GSoC project on neural networks last year
(http://goo.gl/buHkyv), but I think it was not merged in. Instead of
creating own modules, one might also provide an interface to theano, or
other nnet libraries.

Christof

On 10/02/2015 15:41, Andy wrote:
> I'd say this years JMLR is too fresh ;)
>
>
> On 02/09/2015 04:31 PM, Ronnie Ghose wrote:
>> are we interested in more discriminant methods? There were a few more
>> @ JMLR this year
>>
>> On Mon, Feb 9, 2015 at 4:28 PM, Alexandre Gramfort
>> <***@m4x.org <mailto:***@m4x.org>> wrote:
>>
>> please wait a bit so we finalize the list. It's not definitive.
>>
>> A
>>
>> On Mon, Feb 9, 2015 at 6:39 PM, ragv ragv <***@gmail.com
>> <mailto:***@gmail.com>> wrote:
>> > Hi,
>> >
>> > I saw implementing GAMs as one of the suggested topics for GSoC
>> 2015.
>> > Could I take that up? I saw your ( Alex's ) name under that. If
>> yes,
>> > please let me know I'll start working on the same and if you
>> permit me
>> > to, I'll start a wiki page for my proposal and timeline.
>> >
>> > Thanks
>> > ragv
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Dive into the World of Parallel Programming. The Go Parallel
>> Website,
>> > sponsored by Intel and developed in partnership with Slashdot
>> Media, is your
>> > hub for all things parallel software development, from weekly
>> thought
>> > leadership blogs to news, videos, case studies, tutorials and
>> more. Take a
>> > look and join the conversation now.
>> http://goparallel.sourceforge.net/
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > Scikit-learn-***@lists.sourceforge.net
>> <mailto:Scikit-learn-***@lists.sourceforge.net>
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot
>> Media, is your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and
>> more. Take a
>> look and join the conversation now.
>> http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> <mailto:Scikit-learn-***@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now.http://goparallel.sourceforge.net/
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

--
Christof Angermueller
***@gmail.com
http://cangermueller.com
Andy
2015-02-11 22:32:28 UTC
Permalink
The MLP is pretty close to being merged, I think the autoencoder, too.
We don't want to rely on Theano, and there is already pylearn2, which is
a great library for deep learning in python.
I'm not sure if pylearn2 is completely sklearn compatible, but I don't
think there is any need to create another high-level interface.

On 02/11/2015 05:24 PM, Christof Angermueller wrote:
> as far as I know, sklearn has only an RBM module, but does not support
> multilayer perceptrons (MLPs), autoencoder, or recurrent neural
> networks. Are there any plans do extend sklearn by some neural network
> related modules?
> There was a GSoC project on neural networks last year
> (http://goo.gl/buHkyv), but I think it was not merged in. Instead of
> creating own modules, one might also provide an interface to theano,
> or other nnet libraries.
>
> Christof
>
> On 10/02/2015 15:41, Andy wrote:
>> I'd say this years JMLR is too fresh ;)
>>
>>
>> On 02/09/2015 04:31 PM, Ronnie Ghose wrote:
>>> are we interested in more discriminant methods? There were a few
>>> more @ JMLR this year
>>>
>>> On Mon, Feb 9, 2015 at 4:28 PM, Alexandre Gramfort
>>> <***@m4x.org <mailto:***@m4x.org>> wrote:
>>>
>>> please wait a bit so we finalize the list. It's not definitive.
>>>
>>> A
>>>
>>> On Mon, Feb 9, 2015 at 6:39 PM, ragv ragv <***@gmail.com
>>> <mailto:***@gmail.com>> wrote:
>>> > Hi,
>>> >
>>> > I saw implementing GAMs as one of the suggested topics for
>>> GSoC 2015.
>>> > Could I take that up? I saw your ( Alex's ) name under that.
>>> If yes,
>>> > please let me know I'll start working on the same and if you
>>> permit me
>>> > to, I'll start a wiki page for my proposal and timeline.
>>> >
>>> > Thanks
>>> > ragv
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > Dive into the World of Parallel Programming. The Go Parallel
>>> Website,
>>> > sponsored by Intel and developed in partnership with Slashdot
>>> Media, is your
>>> > hub for all things parallel software development, from weekly
>>> thought
>>> > leadership blogs to news, videos, case studies, tutorials and
>>> more. Take a
>>> > look and join the conversation now.
>>> http://goparallel.sourceforge.net/
>>> > _______________________________________________
>>> > Scikit-learn-general mailing list
>>> > Scikit-learn-***@lists.sourceforge.net
>>> <mailto:Scikit-learn-***@lists.sourceforge.net>
>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel
>>> Website,
>>> sponsored by Intel and developed in partnership with Slashdot
>>> Media, is your
>>> hub for all things parallel software development, from weekly
>>> thought
>>> leadership blogs to news, videos, case studies, tutorials and
>>> more. Take a
>>> look and join the conversation now.
>>> http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> <mailto:Scikit-learn-***@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>>> look and join the conversation now.http://goparallel.sourceforge.net/
>>>
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now.http://goparallel.sourceforge.net/
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> --
> Christof Angermueller
> ***@gmail.com
> http://cangermueller.com
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Anirudh Acharya
2015-02-11 22:55:12 UTC
Permalink
Is the following a good idea for GSoC 2015.

* Latent Dirichlet Allocation using Markov Chain Monte Carlo
* Extend to do inference with online stream of documents.



Anirudh


On 11 February 2015 at 15:32, Andy <***@gmail.com> wrote:

> The MLP is pretty close to being merged, I think the autoencoder, too.
> We don't want to rely on Theano, and there is already pylearn2, which is a
> great library for deep learning in python.
> I'm not sure if pylearn2 is completely sklearn compatible, but I don't
> think there is any need to create another high-level interface.
>
>
> On 02/11/2015 05:24 PM, Christof Angermueller wrote:
>
> as far as I know, sklearn has only an RBM module, but does not support
> multilayer perceptrons (MLPs), autoencoder, or recurrent neural networks.
> Are there any plans do extend sklearn by some neural network related
> modules?
> There was a GSoC project on neural networks last year (
> http://goo.gl/buHkyv), but I think it was not merged in. Instead of
> creating own modules, one might also provide an interface to theano, or
> other nnet libraries.
>
> Christof
>
> On 10/02/2015 15:41, Andy wrote:
>
> I'd say this years JMLR is too fresh ;)
>
>
> On 02/09/2015 04:31 PM, Ronnie Ghose wrote:
>
> are we interested in more discriminant methods? There were a few more @
> JMLR this year
>
> On Mon, Feb 9, 2015 at 4:28 PM, Alexandre Gramfort <
> ***@m4x.org> wrote:
>
>> please wait a bit so we finalize the list. It's not definitive.
>>
>> A
>>
>> On Mon, Feb 9, 2015 at 6:39 PM, ragv ragv <***@gmail.com> wrote:
>> > Hi,
>> >
>> > I saw implementing GAMs as one of the suggested topics for GSoC 2015.
>> > Could I take that up? I saw your ( Alex's ) name under that. If yes,
>> > please let me know I'll start working on the same and if you permit me
>> > to, I'll start a wiki page for my proposal and timeline.
>> >
>> > Thanks
>> > ragv
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Dive into the World of Parallel Programming. The Go Parallel Website,
>> > sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> > hub for all things parallel software development, from weekly thought
>> > leadership blogs to news, videos, case studies, tutorials and more.
>> Take a
>> > look and join the conversation now. http://goparallel.sourceforge.net/
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > Scikit-learn-***@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
>
>
>
> _______________________________________________
> Scikit-learn-general mailing listScikit-learn-***@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
>
>
>
> _______________________________________________
> Scikit-learn-general mailing listScikit-learn-***@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> --
> Christof ***@gmail.comhttp://cangermueller.com
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
>
>
>
> _______________________________________________
> Scikit-learn-general mailing listScikit-learn-***@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


--
Anirudh Acharya
Graduate Student
Arizona State University
Gael Varoquaux
2015-02-12 06:21:31 UTC
Permalink
On Wed, Feb 11, 2015 at 03:55:12PM -0700, Anirudh Acharya wrote:
> Is the following a good idea for GSoC 2015.

> * Latent Dirichlet Allocation using Markov Chain Monte Carlo
> * Extend to do inference with online stream of documents.

MCMC no. We ruled against it, as MCMC require techniques that are not
used very much in scikit-learn. But there is a pull request implementing
the online non MCMC Latent Dirichlet Allocation algorithm.

Gaël
Chris Holdgraf
2015-02-12 06:40:58 UTC
Permalink
Also, isn't MCMC implemented in PyMC quite effectively? I haven't been
following the development of that codebase, but last time I checked it
seemed like there were a lot of interesting things you could do with it re:
probabilistic models

On Wed, Feb 11, 2015 at 10:39 PM, Chris Holdgraf <***@gmail.com>
wrote:

> Also, isn't MCMC implemented in PyMC quite effectively? I haven't been
> following the development of that codebase, but last time I checked it
> seemed like there were a lot of interesting things you could do with it re:
> probabilistic models.
>
> On Wed, Feb 11, 2015 at 10:21 PM, Gael Varoquaux <
> ***@normalesup.org> wrote:
>
>> On Wed, Feb 11, 2015 at 03:55:12PM -0700, Anirudh Acharya wrote:
>> > Is the following a good idea for GSoC 2015.
>>
>> > * Latent Dirichlet Allocation using Markov Chain Monte Carlo
>> > * Extend to do inference with online stream of documents.
>>
>> MCMC no. We ruled against it, as MCMC require techniques that are not
>> used very much in scikit-learn. But there is a pull request implementing
>> the online non MCMC Latent Dirichlet Allocation algorithm.
>>
>> Gaël
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>


--
_____________________________________

PhD Candidate in Neuroscience | UC Berkeley <http://hwni.org/>
Editor and Web Master | Berkeley Science Review
<http://sciencereview.berkeley.edu/>
_____________________________________
Anirudh Acharya
2015-02-12 11:02:12 UTC
Permalink
On 11 February 2015 at 23:21, Gael Varoquaux <***@normalesup.org>
wrote:

> On Wed, Feb 11, 2015 at 03:55:12PM -0700, Anirudh Acharya wrote:
> > Is the following a good idea for GSoC 2015.
>
> > * Latent Dirichlet Allocation using Markov Chain Monte Carlo
> > * Extend to do inference with online stream of documents.
>
> MCMC no. We ruled against it, as MCMC require techniques that are not
> used very much in scikit-learn. But there is a pull request implementing
> the online non MCMC Latent Dirichlet Allocation algorithm.
>
> Gaël
>

If not MCMC, could we try other approximate inference techniques like
variational bayes, which are comparatively faster. Wouldn't having LDA as
part of scikit-learn would be good as LDA is also a way of looking at topic
models as a Bayesian Matrix Factorization approach for Sparse Matrices.

https://sites.google.com/site/igorcarron2/matrixfactorizations
http://www.wsdm-conference.org/2010/proceedings/docs/p91.pdf
http://www.cs.cmu.edu/~ggordon/singh-gordon-unified-factorization-ecml.pdf

-
Anirudh


>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>



--
Anirudh Acharya
Graduate Student
Arizona State University
Gael Varoquaux
2015-02-12 11:13:07 UTC
Permalink
On Thu, Feb 12, 2015 at 04:02:12AM -0700, Anirudh Acharya wrote:
> But there is a pull request implementing the online non MCMC Latent
> Dirichlet Allocation algorithm.

> If not MCMC, could we try other approximate inference techniques like
> variational bayes, which are comparatively faster.

https://github.com/scikit-learn/scikit-learn/pull/3659

G
Artem
2015-02-11 22:59:17 UTC
Permalink
There was an interview with Ilya Sutskever about deep learning (
http://yyue.blogspot.ru/2015/01/a-brief-overview-of-deep-learning.html),
where he states that DL's success can be attributed to 3 main breakthroughs:

1. Computing resources.
2. Large datasets.
3. Tricks of the trade, discovered in recent years.

The first bullet is the most important, IMO. Deep Learning is usually done
on GPU (or, in Jeff Dean's style — on a cluster), and even in that case it
takes hours to run. I haven't seen any mentions of GPU support in sklearn,
so I assume there's none.
I doubt that DL's models would be useful without such computing power.

As to (current) Deep Learning models, according to my understanding, even
though RBMs and AutoEncoders might have fell out of interest, convolutional
and recurrent networks are still around, and are used extensively.

On Thu, Feb 12, 2015 at 1:24 AM, Christof Angermueller <
***@gmail.com> wrote:

> as far as I know, sklearn has only an RBM module, but does not support
> multilayer perceptrons (MLPs), autoencoder, or recurrent neural networks.
> Are there any plans do extend sklearn by some neural network related
> modules?
> There was a GSoC project on neural networks last year (
> http://goo.gl/buHkyv), but I think it was not merged in. Instead of
> creating own modules, one might also provide an interface to theano, or
> other nnet libraries.
>
> Christof
>
> On 10/02/2015 15:41, Andy wrote:
>
> I'd say this years JMLR is too fresh ;)
>
>
> On 02/09/2015 04:31 PM, Ronnie Ghose wrote:
>
> are we interested in more discriminant methods? There were a few more @
> JMLR this year
>
> On Mon, Feb 9, 2015 at 4:28 PM, Alexandre Gramfort <
> ***@m4x.org> wrote:
>
>> please wait a bit so we finalize the list. It's not definitive.
>>
>> A
>>
>> On Mon, Feb 9, 2015 at 6:39 PM, ragv ragv <***@gmail.com> wrote:
>> > Hi,
>> >
>> > I saw implementing GAMs as one of the suggested topics for GSoC 2015.
>> > Could I take that up? I saw your ( Alex's ) name under that. If yes,
>> > please let me know I'll start working on the same and if you permit me
>> > to, I'll start a wiki page for my proposal and timeline.
>> >
>> > Thanks
>> > ragv
>> >
>> >
>> ------------------------------------------------------------------------------
>> > Dive into the World of Parallel Programming. The Go Parallel Website,
>> > sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> > hub for all things parallel software development, from weekly thought
>> > leadership blogs to news, videos, case studies, tutorials and more.
>> Take a
>> > look and join the conversation now. http://goparallel.sourceforge.net/
>> > _______________________________________________
>> > Scikit-learn-general mailing list
>> > Scikit-learn-***@lists.sourceforge.net
>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
>
>
>
> _______________________________________________
> Scikit-learn-general mailing listScikit-learn-***@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
>
>
>
> _______________________________________________
> Scikit-learn-general mailing listScikit-learn-***@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> --
> Christof ***@gmail.comhttp://cangermueller.com
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
Ronnie Ghose
2015-02-11 23:02:05 UTC
Permalink
But for cnns as you mention - gpusss

On Wed, Feb 11, 2015, 5:59 PM Artem <***@gmail.com> wrote:

> There was an interview with Ilya Sutskever about deep learning (
> http://yyue.blogspot.ru/2015/01/a-brief-overview-of-deep-learning.html),
> where he states that DL's success can be attributed to 3 main breakthroughs:
>
> 1. Computing resources.
> 2. Large datasets.
> 3. Tricks of the trade, discovered in recent years.
>
> The first bullet is the most important, IMO. Deep Learning is usually done
> on GPU (or, in Jeff Dean's style — on a cluster), and even in that case it
> takes hours to run. I haven't seen any mentions of GPU support in sklearn,
> so I assume there's none.
> I doubt that DL's models would be useful without such computing power.
>
> As to (current) Deep Learning models, according to my understanding, even
> though RBMs and AutoEncoders might have fell out of interest, convolutional
> and recurrent networks are still around, and are used extensively.
>
> On Thu, Feb 12, 2015 at 1:24 AM, Christof Angermueller <
> ***@gmail.com> wrote:
>
>> as far as I know, sklearn has only an RBM module, but does not support
>> multilayer perceptrons (MLPs), autoencoder, or recurrent neural networks.
>> Are there any plans do extend sklearn by some neural network related
>> modules?
>> There was a GSoC project on neural networks last year (
>> http://goo.gl/buHkyv), but I think it was not merged in. Instead of
>> creating own modules, one might also provide an interface to theano, or
>> other nnet libraries.
>>
>> Christof
>>
>> On 10/02/2015 15:41, Andy wrote:
>>
>> I'd say this years JMLR is too fresh ;)
>>
>>
>> On 02/09/2015 04:31 PM, Ronnie Ghose wrote:
>>
>> are we interested in more discriminant methods? There were a few more @
>> JMLR this year
>>
>> On Mon, Feb 9, 2015 at 4:28 PM, Alexandre Gramfort <
>> ***@m4x.org> wrote:
>>
>>> please wait a bit so we finalize the list. It's not definitive.
>>>
>>> A
>>>
>>> On Mon, Feb 9, 2015 at 6:39 PM, ragv ragv <***@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > I saw implementing GAMs as one of the suggested topics for GSoC 2015.
>>> > Could I take that up? I saw your ( Alex's ) name under that. If yes,
>>> > please let me know I'll start working on the same and if you permit me
>>> > to, I'll start a wiki page for my proposal and timeline.
>>> >
>>> > Thanks
>>> > ragv
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > Dive into the World of Parallel Programming. The Go Parallel Website,
>>> > sponsored by Intel and developed in partnership with Slashdot Media,
>>> is your
>>> > hub for all things parallel software development, from weekly thought
>>> > leadership blogs to news, videos, case studies, tutorials and more.
>>> Take a
>>> > look and join the conversation now. http://goparallel.sourceforge.net/
>>> > _______________________________________________
>>> > Scikit-learn-general mailing list
>>> > Scikit-learn-***@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing listScikit-learn-***@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing listScikit-learn-***@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>> --
>> Christof ***@gmail.comhttp://cangermueller.com
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
> ------------------------------------------------------------
> ------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Kyle Kastner
2015-02-11 23:07:38 UTC
Permalink
pylearn2 is not even close to sklearn compatible. Small scale recurrent
nets are in PyBrain, but I really think that any seriously usable neural
net type learners are sort of outside the scope of sklearn. Others might
have different opinions, but this is one of the reasons Michael and I
started sklearn-theano . Keep the API but add dependency on Theano, in
return for focusing primarily on those architectures.

In general I think the necessary dependencies of deep learning really
require a library focused only on that. Streaming document processing and
things seem like they would be more directly useful for the current project
scope.

I think I mentioned GaussianRBM as a good addition before, so I will
mention that again here. RBMs and Autoencoders are still useful for feature
extraction in some cases, so it seems reasonable to have them around.

On Wed, Feb 11, 2015 at 5:59 PM, Artem <***@gmail.com> wrote:

> There was an interview with Ilya Sutskever about deep learning (
> http://yyue.blogspot.ru/2015/01/a-brief-overview-of-deep-learning.html),
> where he states that DL's success can be attributed to 3 main breakthroughs:
>
> 1. Computing resources.
> 2. Large datasets.
> 3. Tricks of the trade, discovered in recent years.
>
> The first bullet is the most important, IMO. Deep Learning is usually done
> on GPU (or, in Jeff Dean's style — on a cluster), and even in that case it
> takes hours to run. I haven't seen any mentions of GPU support in sklearn,
> so I assume there's none.
> I doubt that DL's models would be useful without such computing power.
>
> As to (current) Deep Learning models, according to my understanding, even
> though RBMs and AutoEncoders might have fell out of interest, convolutional
> and recurrent networks are still around, and are used extensively.
>
> On Thu, Feb 12, 2015 at 1:24 AM, Christof Angermueller <
> ***@gmail.com> wrote:
>
>> as far as I know, sklearn has only an RBM module, but does not support
>> multilayer perceptrons (MLPs), autoencoder, or recurrent neural networks.
>> Are there any plans do extend sklearn by some neural network related
>> modules?
>> There was a GSoC project on neural networks last year (
>> http://goo.gl/buHkyv), but I think it was not merged in. Instead of
>> creating own modules, one might also provide an interface to theano, or
>> other nnet libraries.
>>
>> Christof
>>
>> On 10/02/2015 15:41, Andy wrote:
>>
>> I'd say this years JMLR is too fresh ;)
>>
>>
>> On 02/09/2015 04:31 PM, Ronnie Ghose wrote:
>>
>> are we interested in more discriminant methods? There were a few more @
>> JMLR this year
>>
>> On Mon, Feb 9, 2015 at 4:28 PM, Alexandre Gramfort <
>> ***@m4x.org> wrote:
>>
>>> please wait a bit so we finalize the list. It's not definitive.
>>>
>>> A
>>>
>>> On Mon, Feb 9, 2015 at 6:39 PM, ragv ragv <***@gmail.com> wrote:
>>> > Hi,
>>> >
>>> > I saw implementing GAMs as one of the suggested topics for GSoC 2015.
>>> > Could I take that up? I saw your ( Alex's ) name under that. If yes,
>>> > please let me know I'll start working on the same and if you permit me
>>> > to, I'll start a wiki page for my proposal and timeline.
>>> >
>>> > Thanks
>>> > ragv
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > Dive into the World of Parallel Programming. The Go Parallel Website,
>>> > sponsored by Intel and developed in partnership with Slashdot Media,
>>> is your
>>> > hub for all things parallel software development, from weekly thought
>>> > leadership blogs to news, videos, case studies, tutorials and more.
>>> Take a
>>> > look and join the conversation now. http://goparallel.sourceforge.net/
>>> > _______________________________________________
>>> > Scikit-learn-general mailing list
>>> > Scikit-learn-***@lists.sourceforge.net
>>> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing listScikit-learn-***@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing listScikit-learn-***@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>> --
>> Christof ***@gmail.comhttp://cangermueller.com
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
Andy
2015-02-11 23:47:52 UTC
Permalink
On 02/11/2015 06:07 PM, Kyle Kastner wrote:
> pylearn2 is not even close to sklearn compatible. Small scale
> recurrent nets are in PyBrain, but I really think that any seriously
> usable neural net type learners are sort of outside the scope of
> sklearn. Others might have different opinions, but this is one of the
> reasons Michael and I started sklearn-theano . Keep the API but add
> dependency on Theano, in return for focusing primarily on those
> architectures.
That seems like the right way to go to me.
Gael Varoquaux
2015-02-12 06:24:01 UTC
Permalink
> I think I mentioned GaussianRBM as a good addition before, so I will mention
> that again here. RBMs and Autoencoders are still useful for feature extraction
> in some cases, so it seems reasonable to have them around.

I have seen any convincing demonstration using our Bernouilli RBMs. It
doesn't feel to me that they are very useful.

Gaël
Ronnie Ghose
2015-02-12 06:27:37 UTC
Permalink
can we have gpu-based/dependent algos as a separate plugin like has been
done for other things? adding more dependencies sounds irksome.

On Thu, Feb 12, 2015 at 1:24 AM, Gael Varoquaux <
***@normalesup.org> wrote:

> > I think I mentioned GaussianRBM as a good addition before, so I will
> mention
> > that again here. RBMs and Autoencoders are still useful for feature
> extraction
> > in some cases, so it seems reasonable to have them around.
>
> I have seen any convincing demonstration using our Bernouilli RBMs. It
> doesn't feel to me that they are very useful.
>
> Gaël
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Kyle Kastner
2015-02-12 06:48:42 UTC
Permalink
Even having a separate plugin will require a lot of maintenance. I am -1 on
any gpu stuff being included directly in sklearn. Maintenance for sklearn
is already tough, and trying to support a huge amount of custom compute
hardware is really, really hard. Ensuring numerical stability between
OS/BLAS versions is already a beast!

We talked in the past about having a core test suite which other packages
cpuld run against as a way to ensure compatibility, and I know that would
be useful for me.
On Feb 12, 2015 1:29 AM, "Ronnie Ghose" <***@gmail.com> wrote:

> can we have gpu-based/dependent algos as a separate plugin like has been
> done for other things? adding more dependencies sounds irksome.
>
> On Thu, Feb 12, 2015 at 1:24 AM, Gael Varoquaux <
> ***@normalesup.org> wrote:
>
>> > I think I mentioned GaussianRBM as a good addition before, so I will
>> mention
>> > that again here. RBMs and Autoencoders are still useful for feature
>> extraction
>> > in some cases, so it seems reasonable to have them around.
>>
>> I have seen any convincing demonstration using our Bernouilli RBMs. It
>> doesn't feel to me that they are very useful.
>>
>> Gaël
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
Ronnie Ghose
2015-02-12 06:50:30 UTC
Permalink
no i mean external plugin that they have to support - we're hands off. we
can link to it but that's it - no other guarantees like we've done in the
past iirc

On Thu, Feb 12, 2015 at 1:48 AM, Kyle Kastner <***@gmail.com> wrote:

> Even having a separate plugin will require a lot of maintenance. I am -1
> on any gpu stuff being included directly in sklearn. Maintenance for
> sklearn is already tough, and trying to support a huge amount of custom
> compute hardware is really, really hard. Ensuring numerical stability
> between OS/BLAS versions is already a beast!
>
> We talked in the past about having a core test suite which other packages
> cpuld run against as a way to ensure compatibility, and I know that would
> be useful for me.
> On Feb 12, 2015 1:29 AM, "Ronnie Ghose" <***@gmail.com> wrote:
>
>> can we have gpu-based/dependent algos as a separate plugin like has been
>> done for other things? adding more dependencies sounds irksome.
>>
>> On Thu, Feb 12, 2015 at 1:24 AM, Gael Varoquaux <
>> ***@normalesup.org> wrote:
>>
>>> > I think I mentioned GaussianRBM as a good addition before, so I will
>>> mention
>>> > that again here. RBMs and Autoencoders are still useful for feature
>>> extraction
>>> > in some cases, so it seems reasonable to have them around.
>>>
>>> I have seen any convincing demonstration using our Bernouilli RBMs. It
>>> doesn't feel to me that they are very useful.
>>>
>>> Gaël
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
Kyle Kastner
2015-02-12 06:57:22 UTC
Permalink
Ah yeah, in that case I think it would be OK. At least that is the path we
took for sklearn-theano.

It's not really a GSoC thing but some implementation of common algs in
numba could be interesting in general to compare with cython versions.
On Feb 12, 2015 1:52 AM, "Ronnie Ghose" <***@gmail.com> wrote:

> no i mean external plugin that they have to support - we're hands off. we
> can link to it but that's it - no other guarantees like we've done in the
> past iirc
>
> On Thu, Feb 12, 2015 at 1:48 AM, Kyle Kastner <***@gmail.com>
> wrote:
>
>> Even having a separate plugin will require a lot of maintenance. I am -1
>> on any gpu stuff being included directly in sklearn. Maintenance for
>> sklearn is already tough, and trying to support a huge amount of custom
>> compute hardware is really, really hard. Ensuring numerical stability
>> between OS/BLAS versions is already a beast!
>>
>> We talked in the past about having a core test suite which other packages
>> cpuld run against as a way to ensure compatibility, and I know that would
>> be useful for me.
>> On Feb 12, 2015 1:29 AM, "Ronnie Ghose" <***@gmail.com> wrote:
>>
>>> can we have gpu-based/dependent algos as a separate plugin like has been
>>> done for other things? adding more dependencies sounds irksome.
>>>
>>> On Thu, Feb 12, 2015 at 1:24 AM, Gael Varoquaux <
>>> ***@normalesup.org> wrote:
>>>
>>>> > I think I mentioned GaussianRBM as a good addition before, so I will
>>>> mention
>>>> > that again here. RBMs and Autoencoders are still useful for feature
>>>> extraction
>>>> > in some cases, so it seems reasonable to have them around.
>>>>
>>>> I have seen any convincing demonstration using our Bernouilli RBMs. It
>>>> doesn't feel to me that they are very useful.
>>>>
>>>> Gaël
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>>> your
>>>> hub for all things parallel software development, from weekly thought
>>>> leadership blogs to news, videos, case studies, tutorials and more.
>>>> Take a
>>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-***@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
Gael Varoquaux
2015-02-12 06:58:03 UTC
Permalink
> no i mean external plugin that they have to support - we're hands off.
> we can link to it but that's it - no other guarantees like we've done
> in the past iirc

That doesn't work well: if it has our name on it, people still associate
it to us, and land on our tracker, or complain that scikit-learn doesn't
work.

That's why I think that a separate project is a good thing.

G
Ronnie Ghose
2015-02-12 07:00:00 UTC
Permalink
yup so tl;dr no gpu things in the GSOC

On Thu, Feb 12, 2015 at 1:58 AM, Gael Varoquaux <
***@normalesup.org> wrote:

> > no i mean external plugin that they have to support - we're hands off.
> > we can link to it but that's it - no other guarantees like we've done
> > in the past iirc
>
> That doesn't work well: if it has our name on it, people still associate
> it to us, and land on our tracker, or complain that scikit-learn doesn't
> work.
>
> That's why I think that a separate project is a good thing.
>
> G
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Kyle Kastner
2015-02-12 07:02:43 UTC
Permalink
Plugin vs separate package:
libsvm/liblinear are plugins whereas "friend" libraries like lightning are
packages right?

By that definition I agree with Gael - standalone packages are best for
that stuff. I don't really know what a "plugin" for sklearn would be
exactly.
On Feb 12, 2015 1:58 AM, "Gael Varoquaux" <***@normalesup.org>
wrote:

> > no i mean external plugin that they have to support - we're hands off.
> > we can link to it but that's it - no other guarantees like we've done
> > in the past iirc
>
> That doesn't work well: if it has our name on it, people still associate
> it to us, and land on our tracker, or complain that scikit-learn doesn't
> work.
>
> That's why I think that a separate project is a good thing.
>
> G
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Kyle Kastner
2015-02-12 07:03:58 UTC
Permalink
GSoC wise it might also be good to look at CCA, PLS etc. for cleanup.
On Feb 12, 2015 2:02 AM, "Kyle Kastner" <***@gmail.com> wrote:

> Plugin vs separate package:
> libsvm/liblinear are plugins whereas "friend" libraries like lightning are
> packages right?
>
> By that definition I agree with Gael - standalone packages are best for
> that stuff. I don't really know what a "plugin" for sklearn would be
> exactly.
> On Feb 12, 2015 1:58 AM, "Gael Varoquaux" <***@normalesup.org>
> wrote:
>
>> > no i mean external plugin that they have to support - we're hands off.
>> > we can link to it but that's it - no other guarantees like we've done
>> > in the past iirc
>>
>> That doesn't work well: if it has our name on it, people still associate
>> it to us, and land on our tracker, or complain that scikit-learn doesn't
>> work.
>>
>> That's why I think that a separate project is a good thing.
>>
>> G
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
Michael Eickenberg
2015-02-12 07:09:30 UTC
Permalink
On Thursday, February 12, 2015, Kyle Kastner <***@gmail.com> wrote:

> GSoC wise it might also be good to look at CCA, PLS etc. for cleanup.
>

+1

achieving that in satisfactory way may be non trivial but would be very
useful. Adding speedups for the nsamples < nfeatures case would render this
tool usable for whole new communities :)



> On Feb 12, 2015 2:02 AM, "Kyle Kastner" <***@gmail.com
> <javascript:_e(%7B%7D,'cvml','***@gmail.com');>> wrote:
>
>> Plugin vs separate package:
>> libsvm/liblinear are plugins whereas "friend" libraries like lightning
>> are packages right?
>>
>> By that definition I agree with Gael - standalone packages are best for
>> that stuff. I don't really know what a "plugin" for sklearn would be
>> exactly.
>> On Feb 12, 2015 1:58 AM, "Gael Varoquaux" <***@normalesup.org
>> <javascript:_e(%7B%7D,'cvml','***@normalesup.org');>> wrote:
>>
>>> > no i mean external plugin that they have to support - we're hands off.
>>> > we can link to it but that's it - no other guarantees like we've done
>>> > in the past iirc
>>>
>>> That doesn't work well: if it has our name on it, people still associate
>>> it to us, and land on our tracker, or complain that scikit-learn doesn't
>>> work.
>>>
>>> That's why I think that a separate project is a good thing.
>>>
>>> G
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> <javascript:_e(%7B%7D,'cvml','Scikit-learn-***@lists.sourceforge.net');>
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
Andy
2015-02-12 21:42:00 UTC
Permalink
On 02/12/2015 02:09 AM, Michael Eickenberg wrote:
>
>
> On Thursday, February 12, 2015, Kyle Kastner <***@gmail.com
> <mailto:***@gmail.com>> wrote:
>
> GSoC wise it might also be good to look at CCA, PLS etc. for cleanup.
>
>
> +1
>
We don't have a mentor, do we?
Ronnie Ghose
2015-02-12 07:10:11 UTC
Permalink
Do you mean refactoring? .. Are refactors/cleanups rather than new features
in scope for GSOC project?

On Thu, Feb 12, 2015 at 2:03 AM, Kyle Kastner <***@gmail.com> wrote:

> GSoC wise it might also be good to look at CCA, PLS etc. for cleanup.
> On Feb 12, 2015 2:02 AM, "Kyle Kastner" <***@gmail.com> wrote:
>
>> Plugin vs separate package:
>> libsvm/liblinear are plugins whereas "friend" libraries like lightning
>> are packages right?
>>
>> By that definition I agree with Gael - standalone packages are best for
>> that stuff. I don't really know what a "plugin" for sklearn would be
>> exactly.
>> On Feb 12, 2015 1:58 AM, "Gael Varoquaux" <***@normalesup.org>
>> wrote:
>>
>>> > no i mean external plugin that they have to support - we're hands off.
>>> > we can link to it but that's it - no other guarantees like we've done
>>> > in the past iirc
>>>
>>> That doesn't work well: if it has our name on it, people still associate
>>> it to us, and land on our tracker, or complain that scikit-learn doesn't
>>> work.
>>>
>>> That's why I think that a separate project is a good thing.
>>>
>>> G
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
Gael Varoquaux
2015-02-12 07:14:20 UTC
Permalink
On Thu, Feb 12, 2015 at 02:10:11AM -0500, Ronnie Ghose wrote:
> Do you mean refactoring? .. Are refactors/cleanups rather than new features in
> scope for GSOC project?

Yes, if they are used to build new features on top of the refactor. But
it is fine to alocate a significant amount of time to the refactor.
Akshay Narasimha
2015-02-12 08:19:27 UTC
Permalink
How about adding partial_fit to existing low rank methods or new
incremental algorithms?

On Thu, Feb 12, 2015 at 12:44 PM, Gael Varoquaux <
***@normalesup.org> wrote:

> On Thu, Feb 12, 2015 at 02:10:11AM -0500, Ronnie Ghose wrote:
> > Do you mean refactoring? .. Are refactors/cleanups rather than new
> features in
> > scope for GSOC project?
>
> Yes, if they are used to build new features on top of the refactor. But
> it is fine to alocate a significant amount of time to the refactor.
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Gael Varoquaux
2015-02-12 08:53:12 UTC
Permalink
> How about adding partial_fit to existing low rank methods or new incremental
> algorithms?

I think that making scikit-learn scale better is an important alley for
the future. Thus I would personnally see very well any kind of efforts in
this direction. However, these need to be well technically motivated:
feasability in terms of code, but also robustness to hyper parameters.

Gaël
Mathieu Blondel
2015-02-12 09:33:42 UTC
Permalink
A grid-search related project could be useful:

- multiple metric support (e.g., find the best model w.r.t. f1 score and
the best model w.r.t. AUC)
- data independent cv iterators (
https://github.com/scikit-learn/scikit-learn/issues/2904)
- anything else?

Mathieu

On Thu, Feb 12, 2015 at 5:53 PM, Gael Varoquaux <
***@normalesup.org> wrote:

> > How about adding partial_fit to existing low rank methods or new
> incremental
> > algorithms?
>
> I think that making scikit-learn scale better is an important alley for
> the future. Thus I would personnally see very well any kind of efforts in
> this direction. However, these need to be well technically motivated:
> feasability in terms of code, but also robustness to hyper parameters.
>
> Gaël
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Artem
2015-02-12 09:47:55 UTC
Permalink
There are several packages (spearmint, hyperopt, MOE) offering Bayesian
Optimization to the problem of choosing hyperparameters. Wouldn't it be
nice to add such *Search[CV] to sklearn?

On Thu, Feb 12, 2015 at 12:33 PM, Mathieu Blondel <***@mblondel.org>
wrote:

> A grid-search related project could be useful:
>
> - multiple metric support (e.g., find the best model w.r.t. f1 score and
> the best model w.r.t. AUC)
> - data independent cv iterators (
> https://github.com/scikit-learn/scikit-learn/issues/2904)
> - anything else?
>
> Mathieu
>
> On Thu, Feb 12, 2015 at 5:53 PM, Gael Varoquaux <
> ***@normalesup.org> wrote:
>
>> > How about adding partial_fit to existing low rank methods or new
>> incremental
>> > algorithms?
>>
>> I think that making scikit-learn scale better is an important alley for
>> the future. Thus I would personnally see very well any kind of efforts in
>> this direction. However, these need to be well technically motivated:
>> feasability in terms of code, but also robustness to hyper parameters.
>>
>> Gaël
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
Andy
2015-02-12 21:41:44 UTC
Permalink
On 02/12/2015 04:47 AM, Artem wrote:
> There are several packages (spearmint, hyperopt, MOE) offering
> Bayesian Optimization to the problem of choosing hyperparameters.
> Wouldn't it be nice to add such *Search[CV] to sklearn?
Yes. I haven't really looked much into the spearmint approach, but
before we could do anything with GPs I am afraid we need to get our GP
up to speed.
Artem
2015-02-12 22:48:04 UTC
Permalink
Do you have any particular ideas on how one could speedup GPs, besides
reimplementing it in Cython? Looks like spearmint is completely pythonic,
so they either as slow (or slower), or use different algorithm (I'm not
very familiar with approaches to GPs).

On Fri, Feb 13, 2015 at 12:41 AM, Andy <***@gmail.com> wrote:

>
> On 02/12/2015 04:47 AM, Artem wrote:
>
> There are several packages (spearmint, hyperopt, MOE) offering Bayesian
> Optimization to the problem of choosing hyperparameters. Wouldn't it be
> nice to add such *Search[CV] to sklearn?
>
> Yes. I haven't really looked much into the spearmint approach, but before
> we could do anything with GPs I am afraid we need to get our GP up to speed.
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
Andy
2015-02-12 23:10:53 UTC
Permalink
Sorry, I was using a possibly confusing idiom. The problem with our GP
is not so much speed as interface and flexibility.
Also, we are not using gradient based parameter optimization.

On 02/12/2015 05:48 PM, Artem wrote:
> Do you have any particular ideas on how one could speedup GPs, besides
> reimplementing it in Cython? Looks like spearmint is completely
> pythonic, so they either as slow (or slower), or use different
> algorithm (I'm not very familiar with approaches to GPs).
>
> On Fri, Feb 13, 2015 at 12:41 AM, Andy <***@gmail.com
> <mailto:***@gmail.com>> wrote:
>
>
> On 02/12/2015 04:47 AM, Artem wrote:
>> There are several packages (spearmint, hyperopt, MOE) offering
>> Bayesian Optimization to the problem of choosing hyperparameters.
>> Wouldn't it be nice to add such *Search[CV] to sklearn?
> Yes. I haven't really looked much into the spearmint approach, but
> before we could do anything with GPs I am afraid we need to get
> our GP up to speed.
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot
> Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and
> more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> <mailto:Scikit-learn-***@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Kyle Kastner
2015-02-12 23:20:12 UTC
Permalink
There are a lot of ways to speed them up as potential work, but the
interface (and backend code) should be very stable first. Gradient based,
latent variable approximation, low-rank updating, and distributed GP (new
paper from a few weeks ago) are all possible, but would need to be compared
to a very solid implementation with stable API to determine how bad
approximation is and if it is worth it performance wise. But if we
eventually want some form of Bayesian hyperparameter optimization a very
stable and fast-ish GP is likely necessary, with way more emphasis on the
*stable* part. So basically, what Andy said :)

Also I am +1 to Alex's list. Seems like it covers a lot of stuff that needs
TLC

Kyle

On Thu, Feb 12, 2015 at 5:10 PM, Andy <***@gmail.com> wrote:

> Sorry, I was using a possibly confusing idiom. The problem with our GP is
> not so much speed as interface and flexibility.
> Also, we are not using gradient based parameter optimization.
>
>
> On 02/12/2015 05:48 PM, Artem wrote:
>
> Do you have any particular ideas on how one could speedup GPs, besides
> reimplementing it in Cython? Looks like spearmint is completely pythonic,
> so they either as slow (or slower), or use different algorithm (I'm not
> very familiar with approaches to GPs).
>
> On Fri, Feb 13, 2015 at 12:41 AM, Andy <***@gmail.com> wrote:
>
>>
>> On 02/12/2015 04:47 AM, Artem wrote:
>>
>> There are several packages (spearmint, hyperopt, MOE) offering Bayesian
>> Optimization to the problem of choosing hyperparameters. Wouldn't it be
>> nice to add such *Search[CV] to sklearn?
>>
>> Yes. I haven't really looked much into the spearmint approach, but
>> before we could do anything with GPs I am afraid we need to get our GP up
>> to speed.
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
>
>
>
> _______________________________________________
> Scikit-learn-general mailing listScikit-learn-***@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
Jan Hendrik Metzen
2015-02-13 08:34:30 UTC
Permalink
Having Bayesian optimization in sklearn would be great +1

I was working recently on a sklearn-compatible rewrite of Gaussian
processes. Main features are gradient-based hyperparameter
optimization, kernel engineering and Gaussian process classification.
The downside is that it is not completely downward compatible with
sklearn's current GP interface. I will create a PR in the next days
where we can discuss the further proceeding (going for merge versus
adding it to the sklearn-extensions).

Best,
Jan

On 13.02.2015 00:10, Andy wrote:
> Sorry, I was using a possibly confusing idiom. The problem with our GP
> is not so much speed as interface and flexibility.
> Also, we are not using gradient based parameter optimization.
>
> On 02/12/2015 05:48 PM, Artem wrote:
>> Do you have any particular ideas on how one could speedup GPs,
>> besides reimplementing it in Cython? Looks like spearmint is
>> completely pythonic, so they either as slow (or slower), or use
>> different algorithm (I'm not very familiar with approaches to GPs).
>>
>> On Fri, Feb 13, 2015 at 12:41 AM, Andy <***@gmail.com
>> <mailto:***@gmail.com>> wrote:
>>
>>
>> On 02/12/2015 04:47 AM, Artem wrote:
>>> There are several packages (spearmint, hyperopt, MOE) offering
>>> Bayesian Optimization to the problem of choosing
>>> hyperparameters. Wouldn't it be nice to add such *Search[CV] to
>>> sklearn?
>> Yes. I haven't really looked much into the spearmint approach,
>> but before we could do anything with GPs I am afraid we need to
>> get our GP up to speed.
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot
>> Media, is your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and
>> more. Take a
>> look and join the conversation now.
>> http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> <mailto:Scikit-learn-***@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now.http://goparallel.sourceforge.net/
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
Jan Hendrik Metzen, Dr.rer.nat.
Team Leader of Team "Sustained Learning"

Universität Bremen und DFKI GmbH, Robotics Innovation Center
FB 3 - Mathematik und Informatik
AG Robotik
Robert-Hooke-Straße 1
28359 Bremen, Germany


Tel.: +49 421 178 45-4123
Zentrale: +49 421 178 45-6611
Fax: +49 421 178 45-4150
E-Mail: ***@informatik.uni-bremen.de
Homepage: http://www.informatik.uni-bremen.de/~jhm/

Weitere Informationen: http://www.informatik.uni-bremen.de/robotik
Nikolay Mayorov
2015-02-13 09:49:22 UTC
Permalink
Hi! Just a general thought.
What do you think about adding data visualization module to scikit-learn? I mean we often want to look at our data, and having out-of-the-box routines for high dimension data visualization would be very handy. Unfortunately, I'm not very familiar with that field, but quite sure there a lot of interesting and useful methods.
> Date: Fri, 13 Feb 2015 09:34:30 +0100
> From: ***@informatik.uni-bremen.de
> To: scikit-learn-***@lists.sourceforge.net
> Subject: Re: [Scikit-learn-general] GSoC2015 topics
>
> Having Bayesian optimization in sklearn would be great +1
>
> I was working recently on a sklearn-compatible rewrite of Gaussian
> processes. Main features are gradient-based hyperparameter
> optimization, kernel engineering and Gaussian process classification.
> The downside is that it is not completely downward compatible with
> sklearn's current GP interface. I will create a PR in the next days
> where we can discuss the further proceeding (going for merge versus
> adding it to the sklearn-extensions).
>
> Best,
> Jan
>
> On 13.02.2015 00:10, Andy wrote:
> > Sorry, I was using a possibly confusing idiom. The problem with our GP
> > is not so much speed as interface and flexibility.
> > Also, we are not using gradient based parameter optimization.
> >
> > On 02/12/2015 05:48 PM, Artem wrote:
> >> Do you have any particular ideas on how one could speedup GPs,
> >> besides reimplementing it in Cython? Looks like spearmint is
> >> completely pythonic, so they either as slow (or slower), or use
> >> different algorithm (I'm not very familiar with approaches to GPs).
> >>
> >> On Fri, Feb 13, 2015 at 12:41 AM, Andy <***@gmail.com
> >> <mailto:***@gmail.com>> wrote:
> >>
> >>
> >> On 02/12/2015 04:47 AM, Artem wrote:
> >>> There are several packages (spearmint, hyperopt, MOE) offering
> >>> Bayesian Optimization to the problem of choosing
> >>> hyperparameters. Wouldn't it be nice to add such *Search[CV] to
> >>> sklearn?
> >> Yes. I haven't really looked much into the spearmint approach,
> >> but before we could do anything with GPs I am afraid we need to
> >> get our GP up to speed.
> >>
> >> ------------------------------------------------------------------------------
> >> Dive into the World of Parallel Programming. The Go Parallel Website,
> >> sponsored by Intel and developed in partnership with Slashdot
> >> Media, is your
> >> hub for all things parallel software development, from weekly thought
> >> leadership blogs to news, videos, case studies, tutorials and
> >> more. Take a
> >> look and join the conversation now.
> >> http://goparallel.sourceforge.net/
> >> _______________________________________________
> >> Scikit-learn-general mailing list
> >> Scikit-learn-***@lists.sourceforge.net
> >> <mailto:Scikit-learn-***@lists.sourceforge.net>
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >>
> >>
> >>
> >>
> >> ------------------------------------------------------------------------------
> >> Dive into the World of Parallel Programming. The Go Parallel Website,
> >> sponsored by Intel and developed in partnership with Slashdot Media, is your
> >> hub for all things parallel software development, from weekly thought
> >> leadership blogs to news, videos, case studies, tutorials and more. Take a
> >> look and join the conversation now.http://goparallel.sourceforge.net/
> >>
> >>
> >> _______________________________________________
> >> Scikit-learn-general mailing list
> >> Scikit-learn-***@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> >
> > ------------------------------------------------------------------------------
> > Dive into the World of Parallel Programming. The Go Parallel Website,
> > sponsored by Intel and developed in partnership with Slashdot Media, is your
> > hub for all things parallel software development, from weekly thought
> > leadership blogs to news, videos, case studies, tutorials and more. Take a
> > look and join the conversation now. http://goparallel.sourceforge.net/
> >
> >
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-***@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> --
> Jan Hendrik Metzen, Dr.rer.nat.
> Team Leader of Team "Sustained Learning"
>
> Universität Bremen und DFKI GmbH, Robotics Innovation Center
> FB 3 - Mathematik und Informatik
> AG Robotik
> Robert-Hooke-Straße 1
> 28359 Bremen, Germany
>
>
> Tel.: +49 421 178 45-4123
> Zentrale: +49 421 178 45-6611
> Fax: +49 421 178 45-4150
> E-Mail: ***@informatik.uni-bremen.de
> Homepage: http://www.informatik.uni-bremen.de/~jhm/
>
> Weitere Informationen: http://www.informatik.uni-bremen.de/robotik
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Gael Varoquaux
2015-02-13 09:51:58 UTC
Permalink
> What do you think about adding data visualization module to scikit-learn?

No. This is outside of the scope of scikit-learn. Separating projects by
scope is a good idea for many reasons.

Gaël
Andy
2015-02-18 00:33:10 UTC
Permalink
Hi Jan.
That sounds great!
Please share early version :) [not that I'd have time to review them :-/]
I think breaking backward-compatibility will be necessary, and we should
think of how we should go about that.

Cheers,
Andy


On 02/13/2015 12:34 AM, Jan Hendrik Metzen wrote:
> Having Bayesian optimization in sklearn would be great +1
>
> I was working recently on a sklearn-compatible rewrite of Gaussian
> processes. Main features are gradient-based hyperparameter
> optimization, kernel engineering and Gaussian process classification.
> The downside is that it is not completely downward compatible with
> sklearn's current GP interface. I will create a PR in the next days
> where we can discuss the further proceeding (going for merge versus
> adding it to the sklearn-extensions).
>
> Best,
> Jan
>
> On 13.02.2015 00:10, Andy wrote:
>> Sorry, I was using a possibly confusing idiom. The problem with our GP
>> is not so much speed as interface and flexibility.
>> Also, we are not using gradient based parameter optimization.
>>
>> On 02/12/2015 05:48 PM, Artem wrote:
>>> Do you have any particular ideas on how one could speedup GPs,
>>> besides reimplementing it in Cython? Looks like spearmint is
>>> completely pythonic, so they either as slow (or slower), or use
>>> different algorithm (I'm not very familiar with approaches to GPs).
>>>
>>> On Fri, Feb 13, 2015 at 12:41 AM, Andy <***@gmail.com
>>> <mailto:***@gmail.com>> wrote:
>>>
>>>
>>> On 02/12/2015 04:47 AM, Artem wrote:
>>>> There are several packages (spearmint, hyperopt, MOE) offering
>>>> Bayesian Optimization to the problem of choosing
>>>> hyperparameters. Wouldn't it be nice to add such *Search[CV] to
>>>> sklearn?
>>> Yes. I haven't really looked much into the spearmint approach,
>>> but before we could do anything with GPs I am afraid we need to
>>> get our GP up to speed.
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot
>>> Media, is your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and
>>> more. Take a
>>> look and join the conversation now.
>>> http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> <mailto:Scikit-learn-***@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>>> look and join the conversation now.http://goparallel.sourceforge.net/
>>>
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Andy
2015-02-12 21:46:33 UTC
Permalink
On 02/12/2015 04:33 AM, Mathieu Blondel wrote:
> A grid-search related project could be useful:
>
> - multiple metric support (e.g., find the best model w.r.t. f1 score
> and the best model w.r.t. AUC)
> - data independent cv iterators
> (https://github.com/scikit-learn/scikit-learn/issues/2904)
+1
> - anything else?
recording training scores, recording training times / test times
Gael Varoquaux
2015-02-13 07:43:09 UTC
Permalink
On Thu, Feb 12, 2015 at 06:33:42PM +0900, Mathieu Blondel wrote:
> A grid-search related project could be useful:

I think that the grid-search code is too convoluted and far reaching, and
I think that it would be really hard for someone who does not already
know scikit-learn well to pick up a project on refactoring it.

> - data independent cv iterators (https://github.com/scikit-learn/scikit-learn/
> issues/2904)

I think that that one is almost done, and it mostly needs someone to pick
it up (including reviewing it) and finish it. For me, this PR is very
important, as it is one of the blockers for 1.0 release.

G
Ignacio Rossi
2015-02-19 02:54:15 UTC
Permalink
Hi

>> - data independent cv iterators (
https://github.com/scikit-learn/scikit-learn/
>> issues/2904)
>
>I think that that one is almost done, and it mostly needs someone to pick
>it up (including reviewing it) and finish it. For me, this PR is very
>important, as it is one of the blockers for 1.0 release.

I implemented that pull request. :)
I don't know if I would call it 'almost done', but we were on the right
track, i think.
To avoid bloating the GSoC thread, I made a little writeup un github so we
can talk on how to close things up.

If you're interested, you can read it here:
https://github.com/scikit-learn/scikit-learn/pull/3340#issuecomment-74990878

2015-02-13 4:43 GMT-03:00 Gael Varoquaux <***@normalesup.org>:

> On Thu, Feb 12, 2015 at 06:33:42PM +0900, Mathieu Blondel wrote:
> > A grid-search related project could be useful:
>
> I think that the grid-search code is too convoluted and far reaching, and
> I think that it would be really hard for someone who does not already
> know scikit-learn well to pick up a project on refactoring it.
>
> > - data independent cv iterators (
> https://github.com/scikit-learn/scikit-learn/
> > issues/2904)
>
> I think that that one is almost done, and it mostly needs someone to pick
> it up (including reviewing it) and finish it. For me, this PR is very
> important, as it is one of the blockers for 1.0 release.
>
> G
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Mathieu Blondel
2015-02-12 08:22:04 UTC
Permalink
+1 on the CCA / PLS refactoring, but this would require a student who is
already well versed on these subjects. Mentoring could be an issue as well.

Mathieu

On Thu, Feb 12, 2015 at 4:14 PM, Gael Varoquaux <
***@normalesup.org> wrote:

> On Thu, Feb 12, 2015 at 02:10:11AM -0500, Ronnie Ghose wrote:
> > Do you mean refactoring? .. Are refactors/cleanups rather than new
> features in
> > scope for GSOC project?
>
> Yes, if they are used to build new features on top of the refactor. But
> it is fine to alocate a significant amount of time to the refactor.
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Kyle Kastner
2015-02-12 06:53:34 UTC
Permalink
As for MCMC PyMC (2 and 3) are both great for it. emcee is really cool too.
I don't see a huge reason to rehash that, and most models which need MCMC
during training bear a pretty high computational cost.

Unless the accuracy gain for some algorithm is enormous it seems like it's
best to let other packages focus on that.
On Feb 12, 2015 1:48 AM, "Kyle Kastner" <***@gmail.com> wrote:

> Even having a separate plugin will require a lot of maintenance. I am -1
> on any gpu stuff being included directly in sklearn. Maintenance for
> sklearn is already tough, and trying to support a huge amount of custom
> compute hardware is really, really hard. Ensuring numerical stability
> between OS/BLAS versions is already a beast!
>
> We talked in the past about having a core test suite which other packages
> cpuld run against as a way to ensure compatibility, and I know that would
> be useful for me.
> On Feb 12, 2015 1:29 AM, "Ronnie Ghose" <***@gmail.com> wrote:
>
>> can we have gpu-based/dependent algos as a separate plugin like has been
>> done for other things? adding more dependencies sounds irksome.
>>
>> On Thu, Feb 12, 2015 at 1:24 AM, Gael Varoquaux <
>> ***@normalesup.org> wrote:
>>
>>> > I think I mentioned GaussianRBM as a good addition before, so I will
>>> mention
>>> > that again here. RBMs and Autoencoders are still useful for feature
>>> extraction
>>> > in some cases, so it seems reasonable to have them around.
>>>
>>> I have seen any convincing demonstration using our Bernouilli RBMs. It
>>> doesn't feel to me that they are very useful.
>>>
>>> Gaël
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
Gael Varoquaux
2015-02-12 06:56:32 UTC
Permalink
> Even having a separate plugin will require a lot of maintenance. I am
> -1 on any gpu stuff being included directly in sklearn. Maintenance for
> sklearn is already tough, and trying to support a huge amount of custom
> compute hardware is really, really hard. Ensuring numerical stability
> between OS/BLAS versions is already a beast!

+1

In my opinion, the GPU stack is still not mature enough. It is still very
much dependent on which actual GPU is used, what is the OS, and what are
the drivers.

Gaël
Andy
2015-02-11 23:47:26 UTC
Permalink
http://scikit-learn.org/dev/faq.html#will-you-add-gpu-support

On 02/11/2015 05:59 PM, Artem wrote:
> There was an interview with Ilya Sutskever about deep learning
> (http://yyue.blogspot.ru/2015/01/a-brief-overview-of-deep-learning.html),
> where he states that DL's success can be attributed to 3 main
> breakthroughs:
>
> 1. Computing resources.
> 2. Large datasets.
> 3. Tricks of the trade, discovered in recent years.
>
> The first bullet is the most important, IMO. Deep Learning is usually
> done on GPU (or, in Jeff Dean's style — on a cluster), and even in
> that case it takes hours to run. I haven't seen any mentions of GPU
> support in sklearn, so I assume there's none.
> I doubt that DL's models would be useful without such computing power.
>
> As to (current) Deep Learning models, according to my understanding,
> even though RBMs and AutoEncoders might have fell out of interest,
> convolutional and recurrent networks are still around, and are used
> extensively.
>
> On Thu, Feb 12, 2015 at 1:24 AM, Christof Angermueller
> <***@gmail.com <mailto:***@gmail.com>> wrote:
>
> as far as I know, sklearn has only an RBM module, but does not
> support multilayer perceptrons (MLPs), autoencoder, or recurrent
> neural networks. Are there any plans do extend sklearn by some
> neural network related modules?
> There was a GSoC project on neural networks last year
> (http://goo.gl/buHkyv), but I think it was not merged in. Instead
> of creating own modules, one might also provide an interface to
> theano, or other nnet libraries.
>
> Christof
>
> On 10/02/2015 15:41, Andy wrote:
>> I'd say this years JMLR is too fresh ;)
>>
>>
>> On 02/09/2015 04:31 PM, Ronnie Ghose wrote:
>>> are we interested in more discriminant methods? There were a few
>>> more @ JMLR this year
>>>
>>> On Mon, Feb 9, 2015 at 4:28 PM, Alexandre Gramfort
>>> <***@m4x.org <mailto:***@m4x.org>>
>>> wrote:
>>>
>>> please wait a bit so we finalize the list. It's not definitive.
>>>
>>> A
>>>
>>> On Mon, Feb 9, 2015 at 6:39 PM, ragv ragv <***@gmail.com
>>> <mailto:***@gmail.com>> wrote:
>>> > Hi,
>>> >
>>> > I saw implementing GAMs as one of the suggested topics for
>>> GSoC 2015.
>>> > Could I take that up? I saw your ( Alex's ) name under
>>> that. If yes,
>>> > please let me know I'll start working on the same and if
>>> you permit me
>>> > to, I'll start a wiki page for my proposal and timeline.
>>> >
>>> > Thanks
>>> > ragv
>>> >
>>> >
>>> ------------------------------------------------------------------------------
>>> > Dive into the World of Parallel Programming. The Go
>>> Parallel Website,
>>> > sponsored by Intel and developed in partnership with
>>> Slashdot Media, is your
>>> > hub for all things parallel software development, from
>>> weekly thought
>>> > leadership blogs to news, videos, case studies, tutorials
>>> and more. Take a
>>> > look and join the conversation now.
>>> http://goparallel.sourceforge.net/
>>> > _______________________________________________
>>> > Scikit-learn-general mailing list
>>> > Scikit-learn-***@lists.sourceforge.net
>>> <mailto:Scikit-learn-***@lists.sourceforge.net>
>>> >
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel
>>> Website,
>>> sponsored by Intel and developed in partnership with
>>> Slashdot Media, is your
>>> hub for all things parallel software development, from
>>> weekly thought
>>> leadership blogs to news, videos, case studies, tutorials
>>> and more. Take a
>>> look and join the conversation now.
>>> http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> <mailto:Scikit-learn-***@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>>> look and join the conversation now.http://goparallel.sourceforge.net/
>>>
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net <mailto:Scikit-learn-***@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now.http://goparallel.sourceforge.net/
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net <mailto:Scikit-learn-***@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> --
> Christof Angermueller
> ***@gmail.com <mailto:***@gmail.com>
> http://cangermueller.com
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot
> Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and
> more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> <mailto:Scikit-learn-***@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
ragv ragv
2015-02-12 14:22:15 UTC
Permalink
Hi,

Is there a good deal of interest in having GAMs implemented?

The timeline for such a project would go something like :

Before GSoC:
* Implement SpAM

Before Midterm :
* Help merge pyearth into scikit learn
* Implement Additive Model -> `AdditiveClassifier` /
`AdditiveRegressor` ( Not sure if my wording here is correct )

After Midterm :
* Implement GAMLSS
* Implement LISO

Kindly also see
https://github.com/scikit-learn/scikit-learn/issues/3482 for
references with citation counts.

The package mgcv by Simon Woods / GAM / BAM in CRAN is mature and
could be used as reference material too...

On a scale of 0 to 100 could I know how much importance / interest
would there be in such a project for GSoC 2015?
Ronnie Ghose
2015-02-12 15:12:41 UTC
Permalink
+1 to partial fit -1 to gam and more probabilistic things in sklean

On Thu, Feb 12, 2015, 9:22 AM ragv ragv <***@gmail.com> wrote:

> Hi,
>
> Is there a good deal of interest in having GAMs implemented?
>
> The timeline for such a project would go something like :
>
> Before GSoC:
> * Implement SpAM
>
> Before Midterm :
> * Help merge pyearth into scikit learn
> * Implement Additive Model -> `AdditiveClassifier` /
> `AdditiveRegressor` ( Not sure if my wording here is correct )
>
> After Midterm :
> * Implement GAMLSS
> * Implement LISO
>
> Kindly also see
> https://github.com/scikit-learn/scikit-learn/issues/3482 for
> references with citation counts.
>
> The package mgcv by Simon Woods / GAM / BAM in CRAN is mature and
> could be used as reference material too...
>
> On a scale of 0 to 100 could I know how much importance / interest
> would there be in such a project for GSoC 2015?
>
> ------------------------------------------------------------
> ------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Sebastian Raschka
2015-02-12 16:38:32 UTC
Permalink
What about adding multiclass support for the SVC "roc_auc" for grid search CV to the to do list?

Best,
Sebastian

> On Feb 12, 2015, at 10:12 AM, Ronnie Ghose <***@gmail.com> wrote:
>
> +1 to partial fit -1 to gam and more probabilistic things in sklean
>
>
>> On Thu, Feb 12, 2015, 9:22 AM ragv ragv <***@gmail.com> wrote:
>> Hi,
>>
>> Is there a good deal of interest in having GAMs implemented?
>>
>> The timeline for such a project would go something like :
>>
>> Before GSoC:
>> * Implement SpAM
>>
>> Before Midterm :
>> * Help merge pyearth into scikit learn
>> * Implement Additive Model -> `AdditiveClassifier` /
>> `AdditiveRegressor` ( Not sure if my wording here is correct )
>>
>> After Midterm :
>> * Implement GAMLSS
>> * Implement LISO
>>
>> Kindly also see
>> https://github.com/scikit-learn/scikit-learn/issues/3482 for
>> references with citation counts.
>>
>> The package mgcv by Simon Woods / GAM / BAM in CRAN is mature and
>> could be used as reference material too...
>>
>> On a scale of 0 to 100 could I know how much importance / interest
>> would there be in such a project for GSoC 2015?
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Milton Pividori
2015-02-12 18:55:18 UTC
Permalink
Hi, guys. My name is Milton Pividori and this is the first time I write to
this list. I'm a PhD student, working on clustering, particularly on
consensus clustering. I'm relatively new to Python, and I am migrating
legacy code from MATLAB. I plan to use scikit-learn as well as other
libraries.

After looking at the scikit code and the mailing list, I didn't found any
methods related to consensus clustering or cluster ensembles. I think the
main paper about it is the one from Strehl and Ghosh (2002, JMLR, link
<http://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf>). I don't know
if you discussed about it before, but I think it could be a good idea to
have these consensus functions implemented in scikit-learn (the paper
proposes three, graph-based).

I was thinking on how to implement them. These three consensus functions
(CSPA, HGPA and MCLA) use METIS for graph partitioning. That could be an
obstacle for scikit-learn interests, as a new dependency would be needed (I
found python bindings for it). It would be also necessary to implement some
methods for ensemble generation with varying levels of diversity
(generating different clustering partitions by varying algorithms, changing
their parameters or manipulating data with projections, subsampling or
feature selection), but that's easier than implementing the consensus
functions.

Well, it's just an idea. I would be glad to help with coding if this is
interesting for the community.

Regards,

2015-02-12 13:38 GMT-03:00 Sebastian Raschka <***@gmail.com>:

> What about adding multiclass support for the SVC "roc_auc" for grid search
> CV to the to do list?
>
> Best,
> Sebastian
>
> On Feb 12, 2015, at 10:12 AM, Ronnie Ghose <***@gmail.com> wrote:
>
> +1 to partial fit -1 to gam and more probabilistic things in sklean
>
> On Thu, Feb 12, 2015, 9:22 AM ragv ragv <***@gmail.com> wrote:
>
>> Hi,
>>
>> Is there a good deal of interest in having GAMs implemented?
>>
>> The timeline for such a project would go something like :
>>
>> Before GSoC:
>> * Implement SpAM
>>
>> Before Midterm :
>> * Help merge pyearth into scikit learn
>> * Implement Additive Model -> `AdditiveClassifier` /
>> `AdditiveRegressor` ( Not sure if my wording here is correct )
>>
>> After Midterm :
>> * Implement GAMLSS
>> * Implement LISO
>>
>> Kindly also see
>> https://github.com/scikit-learn/scikit-learn/issues/3482 for
>> references with citation counts.
>>
>> The package mgcv by Simon Woods / GAM / BAM in CRAN is mature and
>> could be used as reference material too...
>>
>> On a scale of 0 to 100 could I know how much importance / interest
>> would there be in such a project for GSoC 2015?
>>
>> ------------------------------------------------------------
>> ------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


--
Milton Pividori
Blog: www.miltonpividori.com.ar
Joel Nothman
2015-02-12 20:46:12 UTC
Permalink
Something that hasn't been discussed in a while is semi-supervised
learning. Issue #1243
<https://github.com/scikit-learn/scikit-learn/issues/1243> suggests a
generic meta-estimator approach may be feasible, but there might be a few
different approaches available. More of an issue in my opinion is that the
API etc for semi-supervised learning needs to be tightened (even if only to
allow external semi-supervised algorithm implementations to fit into the
scikit-learn framework).

For large collections of unlabelled data, partial_fit support is probably a
must; even for smaller collections, I think our cross validation strategies
need altering, as it makes no sense to have unlabelled data as a test
instance. Finally, if the unlabelled data is contiguous in the input, it
would be ideal not to copy that data, which the test of X[y == -1] (and its
inverse) will do.

Would improving semi-supervised techniques / support be an appropriate GSoC?

On 13 February 2015 at 05:55, Milton Pividori <***@gmail.com> wrote:

> Hi, guys. My name is Milton Pividori and this is the first time I write to
> this list. I'm a PhD student, working on clustering, particularly on
> consensus clustering. I'm relatively new to Python, and I am migrating
> legacy code from MATLAB. I plan to use scikit-learn as well as other
> libraries.
>
> After looking at the scikit code and the mailing list, I didn't found any
> methods related to consensus clustering or cluster ensembles. I think the
> main paper about it is the one from Strehl and Ghosh (2002, JMLR, link
> <http://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf>). I don't
> know if you discussed about it before, but I think it could be a good idea
> to have these consensus functions implemented in scikit-learn (the paper
> proposes three, graph-based).
>
> I was thinking on how to implement them. These three consensus functions
> (CSPA, HGPA and MCLA) use METIS for graph partitioning. That could be an
> obstacle for scikit-learn interests, as a new dependency would be needed (I
> found python bindings for it). It would be also necessary to implement some
> methods for ensemble generation with varying levels of diversity
> (generating different clustering partitions by varying algorithms, changing
> their parameters or manipulating data with projections, subsampling or
> feature selection), but that's easier than implementing the consensus
> functions.
>
> Well, it's just an idea. I would be glad to help with coding if this is
> interesting for the community.
>
> Regards,
>
> 2015-02-12 13:38 GMT-03:00 Sebastian Raschka <***@gmail.com>:
>
> What about adding multiclass support for the SVC "roc_auc" for grid search
>> CV to the to do list?
>>
>> Best,
>> Sebastian
>>
>> On Feb 12, 2015, at 10:12 AM, Ronnie Ghose <***@gmail.com>
>> wrote:
>>
>> +1 to partial fit -1 to gam and more probabilistic things in sklean
>>
>> On Thu, Feb 12, 2015, 9:22 AM ragv ragv <***@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Is there a good deal of interest in having GAMs implemented?
>>>
>>> The timeline for such a project would go something like :
>>>
>>> Before GSoC:
>>> * Implement SpAM
>>>
>>> Before Midterm :
>>> * Help merge pyearth into scikit learn
>>> * Implement Additive Model -> `AdditiveClassifier` /
>>> `AdditiveRegressor` ( Not sure if my wording here is correct )
>>>
>>> After Midterm :
>>> * Implement GAMLSS
>>> * Implement LISO
>>>
>>> Kindly also see
>>> https://github.com/scikit-learn/scikit-learn/issues/3482 for
>>> references with citation counts.
>>>
>>> The package mgcv by Simon Woods / GAM / BAM in CRAN is mature and
>>> could be used as reference material too...
>>>
>>> On a scale of 0 to 100 could I know how much importance / interest
>>> would there be in such a project for GSoC 2015?
>>>
>>> ------------------------------------------------------------
>>> ------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> --
> Milton Pividori
> Blog: www.miltonpividori.com.ar
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
Andy
2015-02-12 21:47:49 UTC
Permalink
Hi Milton.

In which context is consensus clustering usually used, and what are the
main applications?
We will not add an external dependency, sorry.

Cheers,
Andy


On 02/12/2015 01:55 PM, Milton Pividori wrote:
> Hi, guys. My name is Milton Pividori and this is the first time I
> write to this list. I'm a PhD student, working on clustering,
> particularly on consensus clustering. I'm relatively new to Python,
> and I am migrating legacy code from MATLAB. I plan to use scikit-learn
> as well as other libraries.
>
> After looking at the scikit code and the mailing list, I didn't found
> any methods related to consensus clustering or cluster ensembles. I
> think the main paper about it is the one from Strehl and Ghosh (2002,
> JMLR, link
> <http://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf>). I don't
> know if you discussed about it before, but I think it could be a good
> idea to have these consensus functions implemented in scikit-learn
> (the paper proposes three, graph-based).
>
> I was thinking on how to implement them. These three consensus
> functions (CSPA, HGPA and MCLA) use METIS for graph partitioning. That
> could be an obstacle for scikit-learn interests, as a new dependency
> would be needed (I found python bindings for it). It would be also
> necessary to implement some methods for ensemble generation with
> varying levels of diversity (generating different clustering
> partitions by varying algorithms, changing their parameters or
> manipulating data with projections, subsampling or feature selection),
> but that's easier than implementing the consensus functions.
>
> Well, it's just an idea. I would be glad to help with coding if this
> is interesting for the community.
>
> Regards,
>
> 2015-02-12 13:38 GMT-03:00 Sebastian Raschka <***@gmail.com
> <mailto:***@gmail.com>>:
>
> What about adding multiclass support for the SVC "roc_auc" for
> grid search CV to the to do list?
>
> Best,
> Sebastian
>
> On Feb 12, 2015, at 10:12 AM, Ronnie Ghose <***@gmail.com
> <mailto:***@gmail.com>> wrote:
>
>> +1 to partial fit -1 to gam and more probabilistic things in sklean
>>
>>
>> On Thu, Feb 12, 2015, 9:22 AM ragv ragv <***@gmail.com
>> <mailto:***@gmail.com>> wrote:
>>
>> Hi,
>>
>> Is there a good deal of interest in having GAMs implemented?
>>
>> The timeline for such a project would go something like :
>>
>> Before GSoC:
>> * Implement SpAM
>>
>> Before Midterm :
>> * Help merge pyearth into scikit learn
>> * Implement Additive Model -> `AdditiveClassifier` /
>> `AdditiveRegressor` ( Not sure if my wording here is correct )
>>
>> After Midterm :
>> * Implement GAMLSS
>> * Implement LISO
>>
>> Kindly also see
>> https://github.com/scikit-learn/scikit-learn/issues/3482 for
>> references with citation counts.
>>
>> The package mgcv by Simon Woods / GAM / BAM in CRAN is mature and
>> could be used as reference material too...
>>
>> On a scale of 0 to 100 could I know how much importance /
>> interest
>> would there be in such a project for GSoC 2015?
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel
>> Website,
>> sponsored by Intel and developed in partnership with Slashdot
>> Media, is your
>> hub for all things parallel software development, from weekly
>> thought
>> leadership blogs to news, videos, case studies, tutorials and
>> more. Take a
>> look and join the conversation now.
>> http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> <mailto:Scikit-learn-***@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot
>> Media, is your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and
>> more. Take a
>> look and join the conversation now.
>> http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> <mailto:Scikit-learn-***@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot
> Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and
> more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> <mailto:Scikit-learn-***@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> --
> Milton Pividori
> Blog: www.miltonpividori.com.ar <http://www.miltonpividori.com.ar>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Alexandre Gramfort
2015-02-12 22:18:08 UTC
Permalink
my short list is:

GMM
GP
PLS/CCA

so consolidate what we have.

Alex
Milton Pividori
2015-02-13 15:02:07 UTC
Permalink
Hi, Andy. Thank you for the interest.

Consensus clustering is usually used in the same context as traditional
clustering techniques. Many papers have reported significantly accuracy
improvements when using these methods, as they can combine partitions from
several different algorithm, finding interesting structures, usually not
discovered by traditional methods. They are similar to ensemble methods in
the supervised world, although they have their own particularities, of
course.

One of the motivations of these methods is to avoid the choice of a single
clustering algorithm by the inexperienced user, who usually finds a lot of
different alternatives for his problem, and this choice is generally not
easy for them. Consensus clustering tries to mitigate this by running
several clustering methods with different parameters (like the number of
clusters). This set of partitions is called ensemble, and it is the input
of the consensus function, which derives from it a single consensus
partition, which usually outperforms all the individual members of the
input set. The JMLR paper
<http://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf> I mentioned
before proposes a framework for this, called Robust Centralized Clustering
(RCC).

Another interesting applications of these methods, as mentioned in the
previous paper, are the Feature-Distributed Clustering (FDC) and
Object-Distributed Clustering (ODC). The first one, FDC, allows the user to
combine partitions generated from partial views of the data. A common
scenario are distributed data bases, which usually can not be integrated at
a centralized location because of different aspects (proprietary data,
privacy concerns, performance issues, etc). In such scenarios, it is more
realistic to have different "clusterers" at those different places, and
then combine only the clustering results at a central location. This is
possible because the consensus function only needs access to cluster labels
produced by those clusterers (traditional methods), not to the whole data.
The other application, ODC, is similar but with distributed objects instead
of distributed features, and it has their own challenges. An example is a
distributed customer data base of a company located at different cities.
One of the issues here, for instance, is that the consensus function needs
some overlap.

Well, this is a short description of these methods. Let me know if you need
more details.

Regards,

Milton

2015-02-12 18:47 GMT-03:00 Andy <***@gmail.com>:

> Hi Milton.
>
> In which context is consensus clustering usually used, and what are the
> main applications?
> We will not add an external dependency, sorry.
>
> Cheers,
> Andy
>
>
>
> On 02/12/2015 01:55 PM, Milton Pividori wrote:
>
> Hi, guys. My name is Milton Pividori and this is the first time I write to
> this list. I'm a PhD student, working on clustering, particularly on
> consensus clustering. I'm relatively new to Python, and I am migrating
> legacy code from MATLAB. I plan to use scikit-learn as well as other
> libraries.
>
> After looking at the scikit code and the mailing list, I didn't found
> any methods related to consensus clustering or cluster ensembles. I think
> the main paper about it is the one from Strehl and Ghosh (2002, JMLR, link
> <http://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf>). I don't
> know if you discussed about it before, but I think it could be a good idea
> to have these consensus functions implemented in scikit-learn (the paper
> proposes three, graph-based).
>
> I was thinking on how to implement them. These three consensus functions
> (CSPA, HGPA and MCLA) use METIS for graph partitioning. That could be an
> obstacle for scikit-learn interests, as a new dependency would be needed (I
> found python bindings for it). It would be also necessary to implement some
> methods for ensemble generation with varying levels of diversity
> (generating different clustering partitions by varying algorithms, changing
> their parameters or manipulating data with projections, subsampling or
> feature selection), but that's easier than implementing the consensus
> functions.
>
> Well, it's just an idea. I would be glad to help with coding if this is
> interesting for the community.
>
> Regards,
>
> 2015-02-12 13:38 GMT-03:00 Sebastian Raschka <***@gmail.com>:
>
>> What about adding multiclass support for the SVC "roc_auc" for grid
>> search CV to the to do list?
>>
>> Best,
>> Sebastian
>>
>> On Feb 12, 2015, at 10:12 AM, Ronnie Ghose <***@gmail.com>
>> wrote:
>>
>> +1 to partial fit -1 to gam and more probabilistic things in sklean
>>
>> On Thu, Feb 12, 2015, 9:22 AM ragv ragv <***@gmail.com> wrote:
>>
>>> Hi,
>>>
>>> Is there a good deal of interest in having GAMs implemented?
>>>
>>> The timeline for such a project would go something like :
>>>
>>> Before GSoC:
>>> * Implement SpAM
>>>
>>> Before Midterm :
>>> * Help merge pyearth into scikit learn
>>> * Implement Additive Model -> `AdditiveClassifier` /
>>> `AdditiveRegressor` ( Not sure if my wording here is correct )
>>>
>>> After Midterm :
>>> * Implement GAMLSS
>>> * Implement LISO
>>>
>>> Kindly also see
>>> https://github.com/scikit-learn/scikit-learn/issues/3482 for
>>> references with citation counts.
>>>
>>> The package mgcv by Simon Woods / GAM / BAM in CRAN is mature and
>>> could be used as reference material too...
>>>
>>> On a scale of 0 to 100 could I know how much importance / interest
>>> would there be in such a project for GSoC 2015?
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> --
> Milton Pividori
> Blog: www.miltonpividori.com.ar
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
>
>
>
> _______________________________________________
> Scikit-learn-general mailing listScikit-learn-***@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


--
Milton Pividori
Blog: www.miltonpividori.com.ar
Ronnie Ghose
2015-02-13 15:08:45 UTC
Permalink
-1 we would have to build in support for more clustering methods ,sounds
like a not-very-standalone proj

On Fri, Feb 13, 2015 at 10:02 AM, Milton Pividori <***@gmail.com>
wrote:

> Hi, Andy. Thank you for the interest.
>
> Consensus clustering is usually used in the same context as traditional
> clustering techniques. Many papers have reported significantly accuracy
> improvements when using these methods, as they can combine partitions from
> several different algorithm, finding interesting structures, usually not
> discovered by traditional methods. They are similar to ensemble methods in
> the supervised world, although they have their own particularities, of
> course.
>
> One of the motivations of these methods is to avoid the choice of a single
> clustering algorithm by the inexperienced user, who usually finds a lot of
> different alternatives for his problem, and this choice is generally not
> easy for them. Consensus clustering tries to mitigate this by running
> several clustering methods with different parameters (like the number of
> clusters). This set of partitions is called ensemble, and it is the input
> of the consensus function, which derives from it a single consensus
> partition, which usually outperforms all the individual members of the
> input set. The JMLR paper
> <http://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf> I mentioned
> before proposes a framework for this, called Robust Centralized Clustering
> (RCC).
>
> Another interesting applications of these methods, as mentioned in the
> previous paper, are the Feature-Distributed Clustering (FDC) and
> Object-Distributed Clustering (ODC). The first one, FDC, allows the user to
> combine partitions generated from partial views of the data. A common
> scenario are distributed data bases, which usually can not be integrated at
> a centralized location because of different aspects (proprietary data,
> privacy concerns, performance issues, etc). In such scenarios, it is more
> realistic to have different "clusterers" at those different places, and
> then combine only the clustering results at a central location. This is
> possible because the consensus function only needs access to cluster labels
> produced by those clusterers (traditional methods), not to the whole data.
> The other application, ODC, is similar but with distributed objects instead
> of distributed features, and it has their own challenges. An example is a
> distributed customer data base of a company located at different cities.
> One of the issues here, for instance, is that the consensus function needs
> some overlap.
>
> Well, this is a short description of these methods. Let me know if you
> need more details.
>
> Regards,
>
> Milton
>
> 2015-02-12 18:47 GMT-03:00 Andy <***@gmail.com>:
>
> Hi Milton.
>>
>> In which context is consensus clustering usually used, and what are the
>> main applications?
>> We will not add an external dependency, sorry.
>>
>> Cheers,
>> Andy
>>
>>
>>
>> On 02/12/2015 01:55 PM, Milton Pividori wrote:
>>
>> Hi, guys. My name is Milton Pividori and this is the first time I write
>> to this list. I'm a PhD student, working on clustering, particularly on
>> consensus clustering. I'm relatively new to Python, and I am migrating
>> legacy code from MATLAB. I plan to use scikit-learn as well as other
>> libraries.
>>
>> After looking at the scikit code and the mailing list, I didn't found
>> any methods related to consensus clustering or cluster ensembles. I think
>> the main paper about it is the one from Strehl and Ghosh (2002, JMLR,
>> link <http://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf>). I
>> don't know if you discussed about it before, but I think it could be a good
>> idea to have these consensus functions implemented in scikit-learn (the
>> paper proposes three, graph-based).
>>
>> I was thinking on how to implement them. These three consensus
>> functions (CSPA, HGPA and MCLA) use METIS for graph partitioning. That
>> could be an obstacle for scikit-learn interests, as a new dependency would
>> be needed (I found python bindings for it). It would be also necessary to
>> implement some methods for ensemble generation with varying levels of
>> diversity (generating different clustering partitions by varying
>> algorithms, changing their parameters or manipulating data with
>> projections, subsampling or feature selection), but that's easier than
>> implementing the consensus functions.
>>
>> Well, it's just an idea. I would be glad to help with coding if this is
>> interesting for the community.
>>
>> Regards,
>>
>> 2015-02-12 13:38 GMT-03:00 Sebastian Raschka <***@gmail.com>:
>>
>>> What about adding multiclass support for the SVC "roc_auc" for grid
>>> search CV to the to do list?
>>>
>>> Best,
>>> Sebastian
>>>
>>> On Feb 12, 2015, at 10:12 AM, Ronnie Ghose <***@gmail.com>
>>> wrote:
>>>
>>> +1 to partial fit -1 to gam and more probabilistic things in sklean
>>>
>>> On Thu, Feb 12, 2015, 9:22 AM ragv ragv <***@gmail.com> wrote:
>>>
>>>> Hi,
>>>>
>>>> Is there a good deal of interest in having GAMs implemented?
>>>>
>>>> The timeline for such a project would go something like :
>>>>
>>>> Before GSoC:
>>>> * Implement SpAM
>>>>
>>>> Before Midterm :
>>>> * Help merge pyearth into scikit learn
>>>> * Implement Additive Model -> `AdditiveClassifier` /
>>>> `AdditiveRegressor` ( Not sure if my wording here is correct )
>>>>
>>>> After Midterm :
>>>> * Implement GAMLSS
>>>> * Implement LISO
>>>>
>>>> Kindly also see
>>>> https://github.com/scikit-learn/scikit-learn/issues/3482 for
>>>> references with citation counts.
>>>>
>>>> The package mgcv by Simon Woods / GAM / BAM in CRAN is mature and
>>>> could be used as reference material too...
>>>>
>>>> On a scale of 0 to 100 could I know how much importance / interest
>>>> would there be in such a project for GSoC 2015?
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>>> your
>>>> hub for all things parallel software development, from weekly thought
>>>> leadership blogs to news, videos, case studies, tutorials and more.
>>>> Take a
>>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-***@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> --
>> Milton Pividori
>> Blog: www.miltonpividori.com.ar
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>>
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing listScikit-learn-***@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> --
> Milton Pividori
> Blog: www.miltonpividori.com.ar
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
Michael Bommarito
2015-02-13 16:05:01 UTC
Permalink
Milton, my opinion is that the best work available in Python for
clustering and community detection has been done in the igraph project (
http://igraph.org/). While I would personally love to see better support
for these un- and semi-supervised taks in sklearn, it is a substantial
investment of time and LOC. If I were you, I would reach out to Gabor or
Tamas to see if they would accept such a PR there in igraph; I would be
happy to introduce you if you'd like.

Thanks,
Michael J. Bommarito II, CEO
Bommarito Consulting, LLC
*Web:* http://www.bommaritollc.com
*Mobile:* +1 (646) 450-3387

On Fri, Feb 13, 2015 at 10:08 AM, Ronnie Ghose <***@gmail.com>
wrote:

> -1 we would have to build in support for more clustering methods ,sounds
> like a not-very-standalone proj
>
> On Fri, Feb 13, 2015 at 10:02 AM, Milton Pividori <***@gmail.com>
> wrote:
>
>> Hi, Andy. Thank you for the interest.
>>
>> Consensus clustering is usually used in the same context as traditional
>> clustering techniques. Many papers have reported significantly accuracy
>> improvements when using these methods, as they can combine partitions from
>> several different algorithm, finding interesting structures, usually not
>> discovered by traditional methods. They are similar to ensemble methods in
>> the supervised world, although they have their own particularities, of
>> course.
>>
>> One of the motivations of these methods is to avoid the choice of a
>> single clustering algorithm by the inexperienced user, who usually finds a
>> lot of different alternatives for his problem, and this choice is generally
>> not easy for them. Consensus clustering tries to mitigate this by running
>> several clustering methods with different parameters (like the number of
>> clusters). This set of partitions is called ensemble, and it is the input
>> of the consensus function, which derives from it a single consensus
>> partition, which usually outperforms all the individual members of the
>> input set. The JMLR paper
>> <http://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf> I mentioned
>> before proposes a framework for this, called Robust Centralized Clustering
>> (RCC).
>>
>> Another interesting applications of these methods, as mentioned in the
>> previous paper, are the Feature-Distributed Clustering (FDC) and
>> Object-Distributed Clustering (ODC). The first one, FDC, allows the user to
>> combine partitions generated from partial views of the data. A common
>> scenario are distributed data bases, which usually can not be integrated at
>> a centralized location because of different aspects (proprietary data,
>> privacy concerns, performance issues, etc). In such scenarios, it is more
>> realistic to have different "clusterers" at those different places, and
>> then combine only the clustering results at a central location. This is
>> possible because the consensus function only needs access to cluster labels
>> produced by those clusterers (traditional methods), not to the whole data.
>> The other application, ODC, is similar but with distributed objects instead
>> of distributed features, and it has their own challenges. An example is a
>> distributed customer data base of a company located at different cities.
>> One of the issues here, for instance, is that the consensus function needs
>> some overlap.
>>
>> Well, this is a short description of these methods. Let me know if you
>> need more details.
>>
>> Regards,
>>
>> Milton
>>
>> 2015-02-12 18:47 GMT-03:00 Andy <***@gmail.com>:
>>
>> Hi Milton.
>>>
>>> In which context is consensus clustering usually used, and what are the
>>> main applications?
>>> We will not add an external dependency, sorry.
>>>
>>> Cheers,
>>> Andy
>>>
>>>
>>>
>>> On 02/12/2015 01:55 PM, Milton Pividori wrote:
>>>
>>> Hi, guys. My name is Milton Pividori and this is the first time I write
>>> to this list. I'm a PhD student, working on clustering, particularly on
>>> consensus clustering. I'm relatively new to Python, and I am migrating
>>> legacy code from MATLAB. I plan to use scikit-learn as well as other
>>> libraries.
>>>
>>> After looking at the scikit code and the mailing list, I didn't found
>>> any methods related to consensus clustering or cluster ensembles. I think
>>> the main paper about it is the one from Strehl and Ghosh (2002, JMLR,
>>> link <http://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf>). I
>>> don't know if you discussed about it before, but I think it could be a good
>>> idea to have these consensus functions implemented in scikit-learn (the
>>> paper proposes three, graph-based).
>>>
>>> I was thinking on how to implement them. These three consensus
>>> functions (CSPA, HGPA and MCLA) use METIS for graph partitioning. That
>>> could be an obstacle for scikit-learn interests, as a new dependency would
>>> be needed (I found python bindings for it). It would be also necessary to
>>> implement some methods for ensemble generation with varying levels of
>>> diversity (generating different clustering partitions by varying
>>> algorithms, changing their parameters or manipulating data with
>>> projections, subsampling or feature selection), but that's easier than
>>> implementing the consensus functions.
>>>
>>> Well, it's just an idea. I would be glad to help with coding if this
>>> is interesting for the community.
>>>
>>> Regards,
>>>
>>> 2015-02-12 13:38 GMT-03:00 Sebastian Raschka <***@gmail.com>:
>>>
>>>> What about adding multiclass support for the SVC "roc_auc" for grid
>>>> search CV to the to do list?
>>>>
>>>> Best,
>>>> Sebastian
>>>>
>>>> On Feb 12, 2015, at 10:12 AM, Ronnie Ghose <***@gmail.com>
>>>> wrote:
>>>>
>>>> +1 to partial fit -1 to gam and more probabilistic things in sklean
>>>>
>>>> On Thu, Feb 12, 2015, 9:22 AM ragv ragv <***@gmail.com> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Is there a good deal of interest in having GAMs implemented?
>>>>>
>>>>> The timeline for such a project would go something like :
>>>>>
>>>>> Before GSoC:
>>>>> * Implement SpAM
>>>>>
>>>>> Before Midterm :
>>>>> * Help merge pyearth into scikit learn
>>>>> * Implement Additive Model -> `AdditiveClassifier` /
>>>>> `AdditiveRegressor` ( Not sure if my wording here is correct )
>>>>>
>>>>> After Midterm :
>>>>> * Implement GAMLSS
>>>>> * Implement LISO
>>>>>
>>>>> Kindly also see
>>>>> https://github.com/scikit-learn/scikit-learn/issues/3482 for
>>>>> references with citation counts.
>>>>>
>>>>> The package mgcv by Simon Woods / GAM / BAM in CRAN is mature and
>>>>> could be used as reference material too...
>>>>>
>>>>> On a scale of 0 to 100 could I know how much importance / interest
>>>>> would there be in such a project for GSoC 2015?
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>>>> sponsored by Intel and developed in partnership with Slashdot Media,
>>>>> is your
>>>>> hub for all things parallel software development, from weekly thought
>>>>> leadership blogs to news, videos, case studies, tutorials and more.
>>>>> Take a
>>>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-***@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>>> your
>>>> hub for all things parallel software development, from weekly thought
>>>> leadership blogs to news, videos, case studies, tutorials and more.
>>>> Take a
>>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>>>
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-***@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>>> your
>>>> hub for all things parallel software development, from weekly thought
>>>> leadership blogs to news, videos, case studies, tutorials and more.
>>>> Take a
>>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-***@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> --
>>> Milton Pividori
>>> Blog: www.miltonpividori.com.ar
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>>
>>>
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing listScikit-learn-***@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> --
>> Milton Pividori
>> Blog: www.miltonpividori.com.ar
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
Milton Pividori
2015-02-13 19:15:14 UTC
Permalink
Hi, Michael, thank you for your comments and ideas.

I was probably not clear enough about the complexity to implement these
algorithms, because I don't think it's a substantial amount of work. Excuse
me if this was already understood, but let me add that consensus clustering
is not limited to graphs. It's about combining multiple clustering
solutions into a single consolidated partition. Using graphs and graph
partitioning is just a way to do it. I was just describing some points of a
paper (a very popular one, though) that proposes three graph-based
consensus functions. But there are many algorithms to combine partitions.
For example, I could implement an "evidence accumulation" approach
(proposed in this paper
<http://ieeexplore.ieee.org/xpl/articleDetails.jsp?tp=&arnumber=1432715>)
just with a hierarchical clustering algorithm (already provided by
scikit-learn) and a set of diverse partitions generated with k-means,
meanshift and dbscan (all provided by sklearn as well), by varying their
parameters (the number of clusters for k-means, the eps or min_samples for
dbscan, etc).

Although I have to take a deeper look at igraph, I think sklearn is better
suited to this kind of algorithms. It provides all the necessary components
to implement consensus clustering: methods for ensemble generation (as it
provides many clustering algorithms as well as methods to sample data
points, feature selection and data projection) and methods for ensemble
combination (one type of consensus functions can be implemented with
hierarchical clustering; it works by making a similarity matrix of the data
using the input partitions and then clustering over it).

In many aspects, an API for these methods would be similar/analogous to
ensemble classifiers' currently implemented in sklearn. In fact, cluster
ensembles were inspired mostly by classifier ensembles.

Regards,

Milton.

2015-02-13 13:05 GMT-03:00 Michael Bommarito <***@bommaritollc.com>:

> Milton, my opinion is that the best work available in Python for
> clustering and community detection has been done in the igraph project (
> http://igraph.org/). While I would personally love to see better support
> for these un- and semi-supervised taks in sklearn, it is a substantial
> investment of time and LOC. If I were you, I would reach out to Gabor or
> Tamas to see if they would accept such a PR there in igraph; I would be
> happy to introduce you if you'd like.
>
> Thanks,
> Michael J. Bommarito II, CEO
> Bommarito Consulting, LLC
> *Web:* http://www.bommaritollc.com
> *Mobile:* +1 (646) 450-3387
>
> On Fri, Feb 13, 2015 at 10:08 AM, Ronnie Ghose <***@gmail.com>
> wrote:
>
>> -1 we would have to build in support for more clustering methods ,sounds
>> like a not-very-standalone proj
>>
>> On Fri, Feb 13, 2015 at 10:02 AM, Milton Pividori <***@gmail.com>
>> wrote:
>>
>>> Hi, Andy. Thank you for the interest.
>>>
>>> Consensus clustering is usually used in the same context as traditional
>>> clustering techniques. Many papers have reported significantly accuracy
>>> improvements when using these methods, as they can combine partitions from
>>> several different algorithm, finding interesting structures, usually not
>>> discovered by traditional methods. They are similar to ensemble methods in
>>> the supervised world, although they have their own particularities, of
>>> course.
>>>
>>> One of the motivations of these methods is to avoid the choice of a
>>> single clustering algorithm by the inexperienced user, who usually finds a
>>> lot of different alternatives for his problem, and this choice is generally
>>> not easy for them. Consensus clustering tries to mitigate this by running
>>> several clustering methods with different parameters (like the number of
>>> clusters). This set of partitions is called ensemble, and it is the input
>>> of the consensus function, which derives from it a single consensus
>>> partition, which usually outperforms all the individual members of the
>>> input set. The JMLR paper
>>> <http://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf> I
>>> mentioned before proposes a framework for this, called Robust Centralized
>>> Clustering (RCC).
>>>
>>> Another interesting applications of these methods, as mentioned in the
>>> previous paper, are the Feature-Distributed Clustering (FDC) and
>>> Object-Distributed Clustering (ODC). The first one, FDC, allows the user to
>>> combine partitions generated from partial views of the data. A common
>>> scenario are distributed data bases, which usually can not be integrated at
>>> a centralized location because of different aspects (proprietary data,
>>> privacy concerns, performance issues, etc). In such scenarios, it is more
>>> realistic to have different "clusterers" at those different places, and
>>> then combine only the clustering results at a central location. This is
>>> possible because the consensus function only needs access to cluster labels
>>> produced by those clusterers (traditional methods), not to the whole data.
>>> The other application, ODC, is similar but with distributed objects instead
>>> of distributed features, and it has their own challenges. An example is a
>>> distributed customer data base of a company located at different cities.
>>> One of the issues here, for instance, is that the consensus function needs
>>> some overlap.
>>>
>>> Well, this is a short description of these methods. Let me know if you
>>> need more details.
>>>
>>> Regards,
>>>
>>> Milton
>>>
>>> 2015-02-12 18:47 GMT-03:00 Andy <***@gmail.com>:
>>>
>>> Hi Milton.
>>>>
>>>> In which context is consensus clustering usually used, and what are the
>>>> main applications?
>>>> We will not add an external dependency, sorry.
>>>>
>>>> Cheers,
>>>> Andy
>>>>
>>>>
>>>>
>>>> On 02/12/2015 01:55 PM, Milton Pividori wrote:
>>>>
>>>> Hi, guys. My name is Milton Pividori and this is the first time I write
>>>> to this list. I'm a PhD student, working on clustering, particularly on
>>>> consensus clustering. I'm relatively new to Python, and I am migrating
>>>> legacy code from MATLAB. I plan to use scikit-learn as well as other
>>>> libraries.
>>>>
>>>> After looking at the scikit code and the mailing list, I didn't found
>>>> any methods related to consensus clustering or cluster ensembles. I think
>>>> the main paper about it is the one from Strehl and Ghosh (2002, JMLR,
>>>> link <http://www.jmlr.org/papers/volume3/strehl02a/strehl02a.pdf>). I
>>>> don't know if you discussed about it before, but I think it could be a good
>>>> idea to have these consensus functions implemented in scikit-learn (the
>>>> paper proposes three, graph-based).
>>>>
>>>> I was thinking on how to implement them. These three consensus
>>>> functions (CSPA, HGPA and MCLA) use METIS for graph partitioning. That
>>>> could be an obstacle for scikit-learn interests, as a new dependency would
>>>> be needed (I found python bindings for it). It would be also necessary to
>>>> implement some methods for ensemble generation with varying levels of
>>>> diversity (generating different clustering partitions by varying
>>>> algorithms, changing their parameters or manipulating data with
>>>> projections, subsampling or feature selection), but that's easier than
>>>> implementing the consensus functions.
>>>>
>>>> Well, it's just an idea. I would be glad to help with coding if this
>>>> is interesting for the community.
>>>>
>>>> Regards,
>>>>
>>>> 2015-02-12 13:38 GMT-03:00 Sebastian Raschka <***@gmail.com>:
>>>>
>>>>> What about adding multiclass support for the SVC "roc_auc" for grid
>>>>> search CV to the to do list?
>>>>>
>>>>> Best,
>>>>> Sebastian
>>>>>
>>>>> On Feb 12, 2015, at 10:12 AM, Ronnie Ghose <***@gmail.com>
>>>>> wrote:
>>>>>
>>>>> +1 to partial fit -1 to gam and more probabilistic things in sklean
>>>>>
>>>>> On Thu, Feb 12, 2015, 9:22 AM ragv ragv <***@gmail.com> wrote:
>>>>>
>>>>>> Hi,
>>>>>>
>>>>>> Is there a good deal of interest in having GAMs implemented?
>>>>>>
>>>>>> The timeline for such a project would go something like :
>>>>>>
>>>>>> Before GSoC:
>>>>>> * Implement SpAM
>>>>>>
>>>>>> Before Midterm :
>>>>>> * Help merge pyearth into scikit learn
>>>>>> * Implement Additive Model -> `AdditiveClassifier` /
>>>>>> `AdditiveRegressor` ( Not sure if my wording here is correct )
>>>>>>
>>>>>> After Midterm :
>>>>>> * Implement GAMLSS
>>>>>> * Implement LISO
>>>>>>
>>>>>> Kindly also see
>>>>>> https://github.com/scikit-learn/scikit-learn/issues/3482 for
>>>>>> references with citation counts.
>>>>>>
>>>>>> The package mgcv by Simon Woods / GAM / BAM in CRAN is mature and
>>>>>> could be used as reference material too...
>>>>>>
>>>>>> On a scale of 0 to 100 could I know how much importance / interest
>>>>>> would there be in such a project for GSoC 2015?
>>>>>>
>>>>>>
>>>>>> ------------------------------------------------------------------------------
>>>>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>>>>> sponsored by Intel and developed in partnership with Slashdot Media,
>>>>>> is your
>>>>>> hub for all things parallel software development, from weekly thought
>>>>>> leadership blogs to news, videos, case studies, tutorials and more.
>>>>>> Take a
>>>>>> look and join the conversation now.
>>>>>> http://goparallel.sourceforge.net/
>>>>>> _______________________________________________
>>>>>> Scikit-learn-general mailing list
>>>>>> Scikit-learn-***@lists.sourceforge.net
>>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>>>> sponsored by Intel and developed in partnership with Slashdot Media,
>>>>> is your
>>>>> hub for all things parallel software development, from weekly thought
>>>>> leadership blogs to news, videos, case studies, tutorials and more.
>>>>> Take a
>>>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>>>>
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-***@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>>>
>>>>> ------------------------------------------------------------------------------
>>>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>>>> sponsored by Intel and developed in partnership with Slashdot Media,
>>>>> is your
>>>>> hub for all things parallel software development, from weekly thought
>>>>> leadership blogs to news, videos, case studies, tutorials and more.
>>>>> Take a
>>>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>>>> _______________________________________________
>>>>> Scikit-learn-general mailing list
>>>>> Scikit-learn-***@lists.sourceforge.net
>>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>>
>>>>>
>>>>
>>>>
>>>> --
>>>> Milton Pividori
>>>> Blog: www.miltonpividori.com.ar
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>>> sponsored by Intel and developed in partnership with Slashdot Media, is your
>>>> hub for all things parallel software development, from weekly thought
>>>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> Scikit-learn-general mailing listScikit-learn-***@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>>> your
>>>> hub for all things parallel software development, from weekly thought
>>>> leadership blogs to news, videos, case studies, tutorials and more.
>>>> Take a
>>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-***@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>
>>>
>>> --
>>> Milton Pividori
>>> Blog: www.miltonpividori.com.ar
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming. The Go Parallel Website,
>>> sponsored by Intel and developed in partnership with Slashdot Media, is
>>> your
>>> hub for all things parallel software development, from weekly thought
>>> leadership blogs to news, videos, case studies, tutorials and more. Take
>>> a
>>> look and join the conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming. The Go Parallel Website,
>> sponsored by Intel and developed in partnership with Slashdot Media, is
>> your
>> hub for all things parallel software development, from weekly thought
>> leadership blogs to news, videos, case studies, tutorials and more. Take a
>> look and join the conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming. The Go Parallel Website,
> sponsored by Intel and developed in partnership with Slashdot Media, is
> your
> hub for all things parallel software development, from weekly thought
> leadership blogs to news, videos, case studies, tutorials and more. Take a
> look and join the conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


--
Milton Pividori
Blog: www.miltonpividori.com.ar
Andy
2015-02-18 00:42:11 UTC
Permalink
On 02/13/2015 07:08 AM, Ronnie Ghose wrote:
> -1 we would have to build in support for more clustering methods
> ,sounds like a not-very-standalone proj
Why? We already have a bunch, right?
Gael Varoquaux
2015-02-18 10:22:39 UTC
Permalink
On Tue, Feb 17, 2015 at 04:42:11PM -0800, Andy wrote:
> On 02/13/2015 07:08 AM, Ronnie Ghose wrote:
> > -1 we would have to build in support for more clustering methods
> > ,sounds like a not-very-standalone proj
> Why? We already have a bunch, right?

I agree with Andreas that any addition should be motivated: the new
clustering method should bring something to the existing ones. It should
be different in some way, and have a clear benefit (pointing to a paper
isn't enough to demonstrate a benefit, the benefit should be easy to
explain and demonstrated many times).

Gaël
Ronnie Ghose
2015-02-18 12:06:59 UTC
Permalink
is there clear use for this clustering method and a sizable number of
citations and obviously performance/accuracy/something benefits that
warrant the time & maitenance cost then?

Also by that @andreas, I'm not seeing a list of clustering methods to be
added, right now it seems unbounded - i don't like unbounded scopes, they
dont make for good project topics



On Wed, Feb 18, 2015 at 5:22 AM, Gael Varoquaux <
***@normalesup.org> wrote:

> On Tue, Feb 17, 2015 at 04:42:11PM -0800, Andy wrote:
> > On 02/13/2015 07:08 AM, Ronnie Ghose wrote:
> > > -1 we would have to build in support for more clustering methods
> > > ,sounds like a not-very-standalone proj
> > Why? We already have a bunch, right?
>
> I agree with Andreas that any addition should be motivated: the new
> clustering method should bring something to the existing ones. It should
> be different in some way, and have a clear benefit (pointing to a paper
> isn't enough to demonstrate a benefit, the benefit should be easy to
> explain and demonstrated many times).
>
> Gaël
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Andy
2015-02-22 22:02:21 UTC
Permalink
the paper is quite well cited (500):
http://scholar.google.com/scholar?q=Combining%20multiple%20clusterings%20using%20evidence%20accumulation&btnG=Search&as_sdt=800000000001&as_sdtp=on

I thought the idea was to add (some of) the ensemble methods described
in the paper, which are meta-algorithms that could build on any of the
methods we already have,
as far as I understand.


On 02/18/2015 04:06 AM, Ronnie Ghose wrote:
> is there clear use for this clustering method and a sizable number of
> citations and obviously performance/accuracy/something benefits that
> warrant the time & maitenance cost then?
>
> Also by that @andreas, I'm not seeing a list of clustering methods to
> be added, right now it seems unbounded - i don't like unbounded
> scopes, they dont make for good project topics
>
>
>
> On Wed, Feb 18, 2015 at 5:22 AM, Gael Varoquaux
> <***@normalesup.org <mailto:***@normalesup.org>>
> wrote:
>
> On Tue, Feb 17, 2015 at 04:42:11PM -0800, Andy wrote:
> > On 02/13/2015 07:08 AM, Ronnie Ghose wrote:
> > > -1 we would have to build in support for more clustering methods
> > > ,sounds like a not-very-standalone proj
> > Why? We already have a bunch, right?
>
> I agree with Andreas that any addition should be motivated: the new
> clustering method should bring something to the existing ones. It
> should
> be different in some way, and have a clear benefit (pointing to a
> paper
> isn't enough to demonstrate a benefit, the benefit should be easy to
> explain and demonstrated many times).
>
> Gaël
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and
> Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration
> & more
> Get technology previously reserved for billion-dollar
> corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> <mailto:Scikit-learn-***@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Milton Pividori
2015-02-24 13:56:33 UTC
Permalink
Yes, the evidence accumulation approach is quite simple yet popular, and
can be implemented on current methods in sklearn. However, I have a doubt
about the current implementation of the agglomerative hierarchical
clustering. Although it could be implemented differently, I would need to
get the final partition by inspecting the tree and specifying the height of
the dendrogram, not the number of clusters (I see there is an open issue
<https://github.com/scikit-learn/scikit-learn/issues/3796> about this).
This way, it is not necessary, according to this approach, to specify the
number of clusters in advance.

Ronnie, there are many applications of these methods in a wide range of
areas, like bioinformatics, businesses and software engineering, among
others. You can look for "consensus clustering", "cluster ensemble",
"clustering aggregation". There are good accuracy/performance reports (in
general, an ensemble method outperforms individual members), as well as
articles
<http://www.nature.com/srep/2014/140827/srep06207/full/srep06207.html> that
doubt about its benefits (although this one does not seem to take into
account the diversity of the input ensemble, which is essential to get good
results).

Regarding GSoC, I have never participated before. I was reading it is a
full job (40 hours week). Unfortunately, I can't do that now, so I think it
would be better to contribute outside the program. Sorry, I had a wrong
idea of it and I should have read before.

Regards,
Milton.

2015-02-22 19:02 GMT-03:00 Andy <***@gmail.com>:

> the paper is quite well cited (500):
> http://scholar.google.com/scholar?q=Combining%20multiple%20clusterings%20using%20evidence%20accumulation&btnG=Search&as_sdt=800000000001&as_sdtp=on
>
> I thought the idea was to add (some of) the ensemble methods described in
> the paper, which are meta-algorithms that could build on any of the methods
> we already have,
> as far as I understand.
>
>
>
> On 02/18/2015 04:06 AM, Ronnie Ghose wrote:
>
> is there clear use for this clustering method and a sizable number of
> citations and obviously performance/accuracy/something benefits that
> warrant the time & maitenance cost then?
>
> Also by that @andreas, I'm not seeing a list of clustering methods to be
> added, right now it seems unbounded - i don't like unbounded scopes, they
> dont make for good project topics
>
>
>
> On Wed, Feb 18, 2015 at 5:22 AM, Gael Varoquaux <
> ***@normalesup.org> wrote:
>
>> On Tue, Feb 17, 2015 at 04:42:11PM -0800, Andy wrote:
>> > On 02/13/2015 07:08 AM, Ronnie Ghose wrote:
>> > > -1 we would have to build in support for more clustering methods
>> > > ,sounds like a not-very-standalone proj
>> > Why? We already have a bunch, right?
>>
>> I agree with Andreas that any addition should be motivated: the new
>> clustering method should bring something to the existing ones. It should
>> be different in some way, and have a clear benefit (pointing to a paper
>> isn't enough to demonstrate a benefit, the benefit should be easy to
>> explain and demonstrated many times).
>>
>> Gaël
>>
>>
>> ------------------------------------------------------------------------------
>> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
>> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
>> with Interactivity, Sharing, Native Excel Exports, App Integration & more
>> Get technology previously reserved for billion-dollar corporations, FREE
>>
>> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREEhttp://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
>
>
>
> _______________________________________________
> Scikit-learn-general mailing listScikit-learn-***@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Download BIRT iHub F-Type - The Free Enterprise-Grade BIRT Server
> from Actuate! Instantly Supercharge Your Business Reports and Dashboards
> with Interactivity, Sharing, Native Excel Exports, App Integration & more
> Get technology previously reserved for billion-dollar corporations, FREE
>
> http://pubads.g.doubleclick.net/gampad/clk?id=190641631&iu=/4140/ostg.clktrk
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>


--
Milton Pividori
Blog: www.miltonpividori.com.ar
Andy
2015-02-24 14:48:55 UTC
Permalink
Hey Everybody.

Here is my somewhat consolidated list of ideas with minor comments.
If anything is missing, please let me know. Also, I don't think people
who want to mentor spoke up yet.
I'll remove all people listed on the wiki as they were copy and pasted
from last year, and I'd rather have actual confirmation.

Topics:
DPGMM / VBGMM: need to be reimplemented using more standard variational
updates. The GMM is actually fine atm (after a couple of pending PRs)

spearmint : Using random forest (they actually use ours) for
hyperparameter optimization. I need to mull this over but I think this
should be easy enough and pretty helpful.

Online low-rank matrix completion : this is from last year and I'm not
sure if it is still desirable / don't know the state of the PR

Multiple metric support : This is somewhat API heavy but I think

PLS/CCA : They need love so very much, but I'm not sure we have a mentor
(if there is one, please speak up!):q

Ensemble Clusters : Proposed by a possible student (Milton) but I think
it is interesting.

Semi-Supervised Learning : Meta-estimator for self-taught learning. Not
sure if there is actually much demand for it, but would be nice.

Additive models: Proposed by ragv, but I'm actually not that sold. We
could include pyearth, but I'm not sure how valuable the other methods
are. Including a significant amount of algorithms just for completeness
is not something I feel great about.


That being said, ragv has put in a tremendous amount of great work and I
feel we should definitely find a project for him (as he seems interested).


Things that I think shouldn't be GSOC projects:

GPs : Jan Hendrik is doing an awesome job there.
MLP : Will be finished soon, either by me or possibly by ragv
data-independent cross-validation : already a bunch of people working on
that, I don't think we should make it GSOC.

Feedback welcome.

Andy
Artem
2015-02-25 20:14:41 UTC
Permalink
>
> ​
> Online low-rank matrix completion : this is from last year and I'm not

sure if it is still desirable / don't know the state of the PR
>
​
You mean ​
​this one <https://github.com/scikit-learn/scikit-learn/pull/2387>
​? I picked it up <https://github.com/scikit-learn/scikit-learn/pull/4237>,
and as of now it passes all the tests. Yet it's not quite online now
(though making it so doesn't look like a big deal).​

On Tue, Feb 24, 2015 at 5:48 PM, Andy <***@gmail.com> wrote:

> Hey Everybody.
>
> Here is my somewhat consolidated list of ideas with minor comments.
> If anything is missing, please let me know. Also, I don't think people
> who want to mentor spoke up yet.
> I'll remove all people listed on the wiki as they were copy and pasted
> from last year, and I'd rather have actual confirmation.
>
> Topics:
> DPGMM / VBGMM: need to be reimplemented using more standard variational
> updates. The GMM is actually fine atm (after a couple of pending PRs)
>
> spearmint : Using random forest (they actually use ours) for
> hyperparameter optimization. I need to mull this over but I think this
> should be easy enough and pretty helpful.
>
> Online low-rank matrix completion : this is from last year and I'm not
> sure if it is still desirable / don't know the state of the PR
>
> Multiple metric support : This is somewhat API heavy but I think
>
> PLS/CCA : They need love so very much, but I'm not sure we have a mentor
> (if there is one, please speak up!):q
>
> Ensemble Clusters : Proposed by a possible student (Milton) but I think
> it is interesting.
>
> Semi-Supervised Learning : Meta-estimator for self-taught learning. Not
> sure if there is actually much demand for it, but would be nice.
>
> Additive models: Proposed by ragv, but I'm actually not that sold. We
> could include pyearth, but I'm not sure how valuable the other methods
> are. Including a significant amount of algorithms just for completeness
> is not something I feel great about.
>
>
> That being said, ragv has put in a tremendous amount of great work and I
> feel we should definitely find a project for him (as he seems interested).
>
>
> Things that I think shouldn't be GSOC projects:
>
> GPs : Jan Hendrik is doing an awesome job there.
> MLP : Will be finished soon, either by me or possibly by ragv
> data-independent cross-validation : already a bunch of people working on
> that, I don't think we should make it GSOC.
>
> Feedback welcome.
>
> Andy
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Artem
2015-03-03 16:31:20 UTC
Permalink
There was a discussion
<http://www.mail-archive.com/scikit-learn-***@lists.sourceforge.net/msg06931.html>
on metric learning a while ago, and several people expressed interest to
see (and contribute to) it in sklearn. But, it looks like that attempt
didn't get anywhere.

What about a project to add several metric learning algorithms to be used
with KNN (NCA, ITML, LMNN, etc). Another application is data
transformation: most of the methods learn some PSD matrix A, so we can
transform the data by multiplying it by A^{-1/2}.

The starting point could be an ICML tutorial
<http://www.slideshare.net/zukun/metric-learning-icml2010-tutorial> by
Brian Kulis with a field overview.

On Tue, Feb 24, 2015 at 5:48 PM, Andy <***@gmail.com> wrote:

> Hey Everybody.
>
> Here is my somewhat consolidated list of ideas with minor comments.
> If anything is missing, please let me know. Also, I don't think people
> who want to mentor spoke up yet.
> I'll remove all people listed on the wiki as they were copy and pasted
> from last year, and I'd rather have actual confirmation.
>
> Topics:
> DPGMM / VBGMM: need to be reimplemented using more standard variational
> updates. The GMM is actually fine atm (after a couple of pending PRs)
>
> spearmint : Using random forest (they actually use ours) for
> hyperparameter optimization. I need to mull this over but I think this
> should be easy enough and pretty helpful.
>
> Online low-rank matrix completion : this is from last year and I'm not
> sure if it is still desirable / don't know the state of the PR
>
> Multiple metric support : This is somewhat API heavy but I think
>
> PLS/CCA : They need love so very much, but I'm not sure we have a mentor
> (if there is one, please speak up!):q
>
> Ensemble Clusters : Proposed by a possible student (Milton) but I think
> it is interesting.
>
> Semi-Supervised Learning : Meta-estimator for self-taught learning. Not
> sure if there is actually much demand for it, but would be nice.
>
> Additive models: Proposed by ragv, but I'm actually not that sold. We
> could include pyearth, but I'm not sure how valuable the other methods
> are. Including a significant amount of algorithms just for completeness
> is not something I feel great about.
>
>
> That being said, ragv has put in a tremendous amount of great work and I
> feel we should definitely find a project for him (as he seems interested).
>
>
> Things that I think shouldn't be GSOC projects:
>
> GPs : Jan Hendrik is doing an awesome job there.
> MLP : Will be finished soon, either by me or possibly by ragv
> data-independent cross-validation : already a bunch of people working on
> that, I don't think we should make it GSOC.
>
> Feedback welcome.
>
> Andy
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Andy
2015-03-03 16:34:27 UTC
Permalink
On 03/03/2015 11:31 AM, Artem wrote:
> There was a discussion
> <http://www.mail-archive.com/scikit-learn-***@lists.sourceforge.net/msg06931.html>
> on metric learning a while ago, and several people expressed interest
> to see (and contribute to) it in sklearn. But, it looks like that
> attempt didn't get anywhere.
>
> What about a project to add several metric learning algorithms to be
> used with KNN (NCA, ITML, LMNN, etc). Another application is data
> transformation: most of the methods learn some PSD matrix A, so we can
> transform the data by multiplying it by A^{-1/2}.
>
> The starting point could be an ICML tutorial
> <http://www.slideshare.net/zukun/metric-learning-icml2010-tutorial> by
> Brian Kulis with a field overview.
I agree that this might be interesting :)
Gael Varoquaux
2015-03-04 00:18:31 UTC
Permalink
Why don't you edit the "ideas" to start hashing a project there. Your
proposal sounds serious enough to get a good start.

Cheers,

Gaël

On Tue, Mar 03, 2015 at 11:34:27AM -0500, Andy wrote:

> On 03/03/2015 11:31 AM, Artem wrote:

> There was a discussion on metric learning a while ago, and several people
> expressed interest to see (and contribute to) it in sklearn. But, it looks
> like that attempt didn't get anywhere. 

> What about a project to add several metric learning algorithms to be used
> with KNN (NCA, ITML, LMNN, etc). Another application is data
> transformation: most of the methods learn some PSD matrix A, so we can
> transform the data by multiplying it by A^{-1/2}.

> The starting point could be an ICML tutorial by Brian Kulis with a field
> overview.

> I agree that this might be interesting :)

> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/

> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
Gael Varoquaux
Researcher, INRIA Parietal
Laboratoire de Neuro-Imagerie Assistee par Ordinateur
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone: ++ 33-1-69-08-79-68
http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
Raghav R V
2015-03-04 00:39:50 UTC
Permalink
FYI I've deleted this page -
https://github.com/scikit-learn/scikit-learn/wiki/A-list-of-topics-for-the-GSOC-2015
in favor of the recently updated
https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-%28GSOC%29-2015

On Wed, Mar 4, 2015 at 5:48 AM, Gael Varoquaux
<***@normalesup.org> wrote:
> Why don't you edit the "ideas" to start hashing a project there. Your
> proposal sounds serious enough to get a good start.
>
> Cheers,
>
> Gaël
>
> On Tue, Mar 03, 2015 at 11:34:27AM -0500, Andy wrote:
>
>> On 03/03/2015 11:31 AM, Artem wrote:
>
>> There was a discussion on metric learning a while ago, and several people
>> expressed interest to see (and contribute to) it in sklearn. But, it looks
>> like that attempt didn't get anywhere.
>
>> What about a project to add several metric learning algorithms to be used
>> with KNN (NCA, ITML, LMNN, etc). Another application is data
>> transformation: most of the methods learn some PSD matrix A, so we can
>> transform the data by multiplying it by A^{-1/2}.
>
>> The starting point could be an ICML tutorial by Brian Kulis with a field
>> overview.
>
>> I agree that this might be interesting :)
>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub for all
>> things parallel software development, from weekly thought leadership blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/
>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> --
> Gael Varoquaux
> Researcher, INRIA Parietal
> Laboratoire de Neuro-Imagerie Assistee par Ordinateur
> NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
> Phone: ++ 33-1-69-08-79-68
> http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Wei Xue
2015-03-05 15:26:19 UTC
Permalink
It seems the results of organization application have come out. I didn't
find scikit-learn in the list of accepted organizations on Google GSOC main
page nor in the PSF GSOC page. Is it rejected?




2015-03-03 19:39 GMT-05:00 Raghav R V <***@gmail.com>:

> FYI I've deleted this page -
>
> https://github.com/scikit-learn/scikit-learn/wiki/A-list-of-topics-for-the-GSOC-2015
> in favor of the recently updated
>
> https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-%28GSOC%29-2015
>
> On Wed, Mar 4, 2015 at 5:48 AM, Gael Varoquaux
> <***@normalesup.org> wrote:
> > Why don't you edit the "ideas" to start hashing a project there. Your
> > proposal sounds serious enough to get a good start.
> >
> > Cheers,
> >
> > Gaël
> >
> > On Tue, Mar 03, 2015 at 11:34:27AM -0500, Andy wrote:
> >
> >> On 03/03/2015 11:31 AM, Artem wrote:
> >
> >> There was a discussion on metric learning a while ago, and several
> people
> >> expressed interest to see (and contribute to) it in sklearn. But,
> it looks
> >> like that attempt didn't get anywhere.
> >
> >> What about a project to add several metric learning algorithms to
> be used
> >> with KNN (NCA, ITML, LMNN, etc). Another application is data
> >> transformation: most of the methods learn some PSD matrix A, so we
> can
> >> transform the data by multiplying it by A^{-1/2}.
> >
> >> The starting point could be an ICML tutorial by Brian Kulis with a
> field
> >> overview.
> >
> >> I agree that this might be interesting :)
> >
> >>
> ------------------------------------------------------------------------------
> >> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> >> by Intel and developed in partnership with Slashdot Media, is your hub
> for all
> >> things parallel software development, from weekly thought leadership
> blogs to
> >> news, videos, case studies, tutorials and more. Take a look and join the
> >> conversation now. http://goparallel.sourceforge.net/
> >
> >> _______________________________________________
> >> Scikit-learn-general mailing list
> >> Scikit-learn-***@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> > --
> > Gael Varoquaux
> > Researcher, INRIA Parietal
> > Laboratoire de Neuro-Imagerie Assistee par Ordinateur
> > NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
> > Phone: ++ 33-1-69-08-79-68
> > http://gael-varoquaux.info
> http://twitter.com/GaelVaroquaux
> >
> >
> ------------------------------------------------------------------------------
> > Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> > by Intel and developed in partnership with Slashdot Media, is your hub
> for all
> > things parallel software development, from weekly thought leadership
> blogs to
> > news, videos, case studies, tutorials and more. Take a look and join the
> > conversation now. http://goparallel.sourceforge.net/
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-***@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Gael Varoquaux
2015-03-05 16:11:39 UTC
Permalink
Hi,

We are under the PSF umbrella.

Gaël

On Thu, Mar 05, 2015 at 10:26:19AM -0500, Wei Xue wrote:
> It seems the results of organization application have come out. I didn't find
> scikit-learn in the list of accepted organizations on Google GSOC main page nor
> in the PSF GSOC page. Is it rejected?




> 2015-03-03 19:39 GMT-05:00 Raghav R V <***@gmail.com>:

> FYI I've deleted this page -
> https://github.com/scikit-learn/scikit-learn/wiki/
> A-list-of-topics-for-the-GSOC-2015
> in favor of the recently updated
> https://github.com/scikit-learn/scikit-learn/wiki/
> Google-summer-of-code-%28GSOC%29-2015

> On Wed, Mar 4, 2015 at 5:48 AM, Gael Varoquaux
> <***@normalesup.org> wrote:
> > Why don't you edit the "ideas" to start hashing a project there. Your
> > proposal sounds serious enough to get a good start.

> > Cheers,

> > Gaël

> > On Tue, Mar 03, 2015 at 11:34:27AM -0500, Andy wrote:

> >> On 03/03/2015 11:31 AM, Artem wrote:

> >>     There was a discussion on metric learning a while ago, and several
> people
> >>     expressed interest to see (and contribute to) it in sklearn. But, it
> looks
> >>     like that attempt didn't get anywhere.

> >>     What about a project to add several metric learning algorithms to be
> used
> >>     with KNN (NCA, ITML, LMNN, etc). Another application is data
> >>     transformation: most of the methods learn some PSD matrix A, so we
> can
> >>     transform the data by multiplying it by A^{-1/2}.

> >>     The starting point could be an ICML tutorial by Brian Kulis with a
> field
> >>     overview.

> >> I agree that this might be interesting :)


> ------------------------------------------------------------------------------
> >> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> >> by Intel and developed in partnership with Slashdot Media, is your hub
> for all
> >> things parallel software development, from weekly thought leadership
> blogs to
> >> news, videos, case studies, tutorials and more. Take a look and join the
> >> conversation now. http://goparallel.sourceforge.net/

> >> _______________________________________________
> >> Scikit-learn-general mailing list
> >> Scikit-learn-***@lists.sourceforge.net
> >> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


> > --
> >     Gael Varoquaux
> >     Researcher, INRIA Parietal
> >     Laboratoire de Neuro-Imagerie Assistee par Ordinateur
> >     NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
> >     Phone:  ++ 33-1-69-08-79-68
> >     http://gael-varoquaux.info            http://twitter.com/
> GaelVaroquaux


> ------------------------------------------------------------------------------
> > Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> > by Intel and developed in partnership with Slashdot Media, is your hub
> for all
> > things parallel software development, from weekly thought leadership
> blogs to
> > news, videos, case studies, tutorials and more. Take a look and join the
> > conversation now. http://goparallel.sourceforge.net/
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-***@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general



> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/

> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general


--
Gael Varoquaux
Researcher, INRIA Parietal
Laboratoire de Neuro-Imagerie Assistee par Ordinateur
NeuroSpin/CEA Saclay , Bat 145, 91191 Gif-sur-Yvette France
Phone: ++ 33-1-69-08-79-68
http://gael-varoquaux.info http://twitter.com/GaelVaroquaux
Andreas Mueller
2015-03-05 18:15:29 UTC
Permalink
Can all would-be mentors please register on Melange?
The list of possible mentors lists Arnaud, probably a C&P from last year.
Arnaud, are you up for mentoring again? Otherwise I'll remove you from
the list.

Then we'd currently have

Gaël Varoquaux (not sure if you have time?), Vlad Niculae, Olivier
Grisel,Alexandre Gramfort, Michael Eickenberg
and me.
Any other volunteers?




On 02/24/2015 09:48 AM, Andy wrote:
> Hey Everybody.
>
> Here is my somewhat consolidated list of ideas with minor comments.
> If anything is missing, please let me know. Also, I don't think people
> who want to mentor spoke up yet.
> I'll remove all people listed on the wiki as they were copy and pasted
> from last year, and I'd rather have actual confirmation.
>
> Topics:
> DPGMM / VBGMM: need to be reimplemented using more standard
> variational updates. The GMM is actually fine atm (after a couple of
> pending PRs)
>
> spearmint : Using random forest (they actually use ours) for
> hyperparameter optimization. I need to mull this over but I think this
> should be easy enough and pretty helpful.
>
> Online low-rank matrix completion : this is from last year and I'm not
> sure if it is still desirable / don't know the state of the PR
>
> Multiple metric support : This is somewhat API heavy but I think
>
> PLS/CCA : They need love so very much, but I'm not sure we have a
> mentor (if there is one, please speak up!):q
>
> Ensemble Clusters : Proposed by a possible student (Milton) but I
> think it is interesting.
>
> Semi-Supervised Learning : Meta-estimator for self-taught learning.
> Not sure if there is actually much demand for it, but would be nice.
>
> Additive models: Proposed by ragv, but I'm actually not that sold. We
> could include pyearth, but I'm not sure how valuable the other methods
> are. Including a significant amount of algorithms just for
> completeness is not something I feel great about.
>
>
> That being said, ragv has put in a tremendous amount of great work and
> I feel we should definitely find a project for him (as he seems
> interested).
>
>
> Things that I think shouldn't be GSOC projects:
>
> GPs : Jan Hendrik is doing an awesome job there.
> MLP : Will be finished soon, either by me or possibly by ragv
> data-independent cross-validation : already a bunch of people working
> on that, I don't think we should make it GSOC.
>
> Feedback welcome.
>
> Andy
>
Michael Eickenberg
2015-03-05 18:21:18 UTC
Permalink
I unfortunately cannot lead any gsoc project this year, but can help out
with code review and mentoring if sb else takes the lead. The two projects
I can be of use for are CCA/PLS rethinking and additive models.

Michael

On Thursday, March 5, 2015, Andreas Mueller <***@gmail.com> wrote:

> Can all would-be mentors please register on Melange?
> The list of possible mentors lists Arnaud, probably a C&P from last year.
> Arnaud, are you up for mentoring again? Otherwise I'll remove you from
> the list.
>
> Then we'd currently have
>
> Gaël Varoquaux (not sure if you have time?), Vlad Niculae, Olivier
> Grisel,Alexandre Gramfort, Michael Eickenberg
> and me.
> Any other volunteers?
>
>
>
>
> On 02/24/2015 09:48 AM, Andy wrote:
> > Hey Everybody.
> >
> > Here is my somewhat consolidated list of ideas with minor comments.
> > If anything is missing, please let me know. Also, I don't think people
> > who want to mentor spoke up yet.
> > I'll remove all people listed on the wiki as they were copy and pasted
> > from last year, and I'd rather have actual confirmation.
> >
> > Topics:
> > DPGMM / VBGMM: need to be reimplemented using more standard
> > variational updates. The GMM is actually fine atm (after a couple of
> > pending PRs)
> >
> > spearmint : Using random forest (they actually use ours) for
> > hyperparameter optimization. I need to mull this over but I think this
> > should be easy enough and pretty helpful.
> >
> > Online low-rank matrix completion : this is from last year and I'm not
> > sure if it is still desirable / don't know the state of the PR
> >
> > Multiple metric support : This is somewhat API heavy but I think
> >
> > PLS/CCA : They need love so very much, but I'm not sure we have a
> > mentor (if there is one, please speak up!):q
> >
> > Ensemble Clusters : Proposed by a possible student (Milton) but I
> > think it is interesting.
> >
> > Semi-Supervised Learning : Meta-estimator for self-taught learning.
> > Not sure if there is actually much demand for it, but would be nice.
> >
> > Additive models: Proposed by ragv, but I'm actually not that sold. We
> > could include pyearth, but I'm not sure how valuable the other methods
> > are. Including a significant amount of algorithms just for
> > completeness is not something I feel great about.
> >
> >
> > That being said, ragv has put in a tremendous amount of great work and
> > I feel we should definitely find a project for him (as he seems
> > interested).
> >
> >
> > Things that I think shouldn't be GSOC projects:
> >
> > GPs : Jan Hendrik is doing an awesome job there.
> > MLP : Will be finished soon, either by me or possibly by ragv
> > data-independent cross-validation : already a bunch of people working
> > on that, I don't think we should make it GSOC.
> >
> > Feedback welcome.
> >
> > Andy
> >
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net <javascript:;>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
Andreas Mueller
2015-03-05 19:25:55 UTC
Permalink
Thanks for volunteering to assist, I updated the wiki accordingly :)


On 03/05/2015 01:21 PM, Michael Eickenberg wrote:
> I unfortunately cannot lead any gsoc project this year, but can help
> out with code review and mentoring if sb else takes the lead. The two
> projects I can be of use for are CCA/PLS rethinking and additive models.
>
> Michael
>
> On Thursday, March 5, 2015, Andreas Mueller <***@gmail.com
> <mailto:***@gmail.com>> wrote:
>
> Can all would-be mentors please register on Melange?
> The list of possible mentors lists Arnaud, probably a C&P from
> last year.
> Arnaud, are you up for mentoring again? Otherwise I'll remove you from
> the list.
>
> Then we'd currently have
>
> Gaël Varoquaux (not sure if you have time?), Vlad Niculae, Olivier
> Grisel,Alexandre Gramfort, Michael Eickenberg
> and me.
> Any other volunteers?
>
>
>
>
> On 02/24/2015 09:48 AM, Andy wrote:
> > Hey Everybody.
> >
> > Here is my somewhat consolidated list of ideas with minor comments.
> > If anything is missing, please let me know. Also, I don't think
> people
> > who want to mentor spoke up yet.
> > I'll remove all people listed on the wiki as they were copy and
> pasted
> > from last year, and I'd rather have actual confirmation.
> >
> > Topics:
> > DPGMM / VBGMM: need to be reimplemented using more standard
> > variational updates. The GMM is actually fine atm (after a couple of
> > pending PRs)
> >
> > spearmint : Using random forest (they actually use ours) for
> > hyperparameter optimization. I need to mull this over but I
> think this
> > should be easy enough and pretty helpful.
> >
> > Online low-rank matrix completion : this is from last year and
> I'm not
> > sure if it is still desirable / don't know the state of the PR
> >
> > Multiple metric support : This is somewhat API heavy but I think
> >
> > PLS/CCA : They need love so very much, but I'm not sure we have a
> > mentor (if there is one, please speak up!):q
> >
> > Ensemble Clusters : Proposed by a possible student (Milton) but I
> > think it is interesting.
> >
> > Semi-Supervised Learning : Meta-estimator for self-taught learning.
> > Not sure if there is actually much demand for it, but would be nice.
> >
> > Additive models: Proposed by ragv, but I'm actually not that
> sold. We
> > could include pyearth, but I'm not sure how valuable the other
> methods
> > are. Including a significant amount of algorithms just for
> > completeness is not something I feel great about.
> >
> >
> > That being said, ragv has put in a tremendous amount of great
> work and
> > I feel we should definitely find a project for him (as he seems
> > interested).
> >
> >
> > Things that I think shouldn't be GSOC projects:
> >
> > GPs : Jan Hendrik is doing an awesome job there.
> > MLP : Will be finished soon, either by me or possibly by ragv
> > data-independent cross-validation : already a bunch of people
> working
> > on that, I don't think we should make it GSOC.
> >
> > Feedback welcome.
> >
> > Andy
> >
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel
> Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your
> hub for all
> things parallel software development, from weekly thought
> leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and
> join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net <javascript:;>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Wei Xue
2015-03-05 21:32:57 UTC
Permalink
Hi, all

I am a graduate student studying machine learning, and will probably apply
GSOC project this year. I just took a loot at the wiki, and found two
interesting topics for me.

- Improve GMM
- Global optimization based Hyper-parameter optimization

For the GMM topic, I studied DP years ago, and implemented a toy DPGMM
using Gibbs sampling on Matlab. I am also familiar with VB. My question is
that does gsoc projects require students fully understand the theory of DP?

For the hyper-parameter optimization topic, since there are already two
python packages *spearmint* and *Hyperopt, *the goal of this topic is to
implement our own modules or to build interfaces for other packages?


Thanks,
Wei Xue


2015-03-05 14:25 GMT-05:00 Andreas Mueller <***@gmail.com>:

> Thanks for volunteering to assist, I updated the wiki accordingly :)
>
>
>
> On 03/05/2015 01:21 PM, Michael Eickenberg wrote:
>
> I unfortunately cannot lead any gsoc project this year, but can help out
> with code review and mentoring if sb else takes the lead. The two projects
> I can be of use for are CCA/PLS rethinking and additive models.
>
> Michael
>
> On Thursday, March 5, 2015, Andreas Mueller <***@gmail.com> wrote:
>
>> Can all would-be mentors please register on Melange?
>> The list of possible mentors lists Arnaud, probably a C&P from last year.
>> Arnaud, are you up for mentoring again? Otherwise I'll remove you from
>> the list.
>>
>> Then we'd currently have
>>
>> Gaël Varoquaux (not sure if you have time?), Vlad Niculae, Olivier
>> Grisel,Alexandre Gramfort, Michael Eickenberg
>> and me.
>> Any other volunteers?
>>
>>
>>
>>
>> On 02/24/2015 09:48 AM, Andy wrote:
>> > Hey Everybody.
>> >
>> > Here is my somewhat consolidated list of ideas with minor comments.
>> > If anything is missing, please let me know. Also, I don't think people
>> > who want to mentor spoke up yet.
>> > I'll remove all people listed on the wiki as they were copy and pasted
>> > from last year, and I'd rather have actual confirmation.
>> >
>> > Topics:
>> > DPGMM / VBGMM: need to be reimplemented using more standard
>> > variational updates. The GMM is actually fine atm (after a couple of
>> > pending PRs)
>> >
>> > spearmint : Using random forest (they actually use ours) for
>> > hyperparameter optimization. I need to mull this over but I think this
>> > should be easy enough and pretty helpful.
>> >
>> > Online low-rank matrix completion : this is from last year and I'm not
>> > sure if it is still desirable / don't know the state of the PR
>> >
>> > Multiple metric support : This is somewhat API heavy but I think
>> >
>> > PLS/CCA : They need love so very much, but I'm not sure we have a
>> > mentor (if there is one, please speak up!):q
>> >
>> > Ensemble Clusters : Proposed by a possible student (Milton) but I
>> > think it is interesting.
>> >
>> > Semi-Supervised Learning : Meta-estimator for self-taught learning.
>> > Not sure if there is actually much demand for it, but would be nice.
>> >
>> > Additive models: Proposed by ragv, but I'm actually not that sold. We
>> > could include pyearth, but I'm not sure how valuable the other methods
>> > are. Including a significant amount of algorithms just for
>> > completeness is not something I feel great about.
>> >
>> >
>> > That being said, ragv has put in a tremendous amount of great work and
>> > I feel we should definitely find a project for him (as he seems
>> > interested).
>> >
>> >
>> > Things that I think shouldn't be GSOC projects:
>> >
>> > GPs : Jan Hendrik is doing an awesome job there.
>> > MLP : Will be finished soon, either by me or possibly by ragv
>> > data-independent cross-validation : already a bunch of people working
>> > on that, I don't think we should make it GSOC.
>> >
>> > Feedback welcome.
>> >
>> > Andy
>> >
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website,
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub
>> for all
>> things parallel software development, from weekly thought leadership
>> blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
>
>
>
> _______________________________________________
> Scikit-learn-general mailing listScikit-learn-***@lists.sourceforge.nethttps://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website,
> sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for
> all
> things parallel software development, from weekly thought leadership blogs
> to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
Andreas Mueller
2015-03-05 21:43:40 UTC
Permalink
Hi Wei Xue.
Thanks for your interest.
For the GMM project being familiar with DPGMM and VB should be enough.
We don't want to use Gibbs sampling in the DP. If you feel comfortable
implementing
a given derivation and have some understanding, that should be fine.

For hyper-parameter optimization, the idea would be to implement our own
version based on
our tree implementation (which is actually also done in spearmint) or
using the new GP.

HTH,
Andreas

On 03/05/2015 04:32 PM, Wei Xue wrote:
> Hi, all
>
> I am a graduate student studying machine learning, and will probably
> apply GSOC project this year. I just took a loot at the wiki, and
> found two interesting topics for me.
>
> * Improve GMM
> * Global optimization based Hyper-parameter optimization
>
> For the GMM topic, I studied DP years ago, and implemented a toy DPGMM
> using Gibbs sampling on Matlab. I am also familiar with VB. My
> question is that does gsoc projects require students fully understand
> the theory of DP?
>
> For the hyper-parameter optimization topic, since there are already
> two python packages /spearmint/ and /Hyperopt, /the goal of this topic
> is to implement our own modules or to build interfaces for other
> packages?
>
>
> Thanks,
> Wei Xue
>
>
> 2015-03-05 14:25 GMT-05:00 Andreas Mueller <***@gmail.com
> <mailto:***@gmail.com>>:
>
> Thanks for volunteering to assist, I updated the wiki accordingly :)
>
>
>
> On 03/05/2015 01:21 PM, Michael Eickenberg wrote:
>> I unfortunately cannot lead any gsoc project this year, but can
>> help out with code review and mentoring if sb else takes the
>> lead. The two projects I can be of use for are CCA/PLS rethinking
>> and additive models.
>>
>> Michael
>>
>> On Thursday, March 5, 2015, Andreas Mueller <***@gmail.com
>> <mailto:***@gmail.com>> wrote:
>>
>> Can all would-be mentors please register on Melange?
>> The list of possible mentors lists Arnaud, probably a C&P
>> from last year.
>> Arnaud, are you up for mentoring again? Otherwise I'll remove
>> you from
>> the list.
>>
>> Then we'd currently have
>>
>> Gaël Varoquaux (not sure if you have time?), Vlad Niculae,
>> Olivier
>> Grisel,Alexandre Gramfort, Michael Eickenberg
>> and me.
>> Any other volunteers?
>>
>>
>>
>>
>> On 02/24/2015 09:48 AM, Andy wrote:
>> > Hey Everybody.
>> >
>> > Here is my somewhat consolidated list of ideas with minor
>> comments.
>> > If anything is missing, please let me know. Also, I don't
>> think people
>> > who want to mentor spoke up yet.
>> > I'll remove all people listed on the wiki as they were copy
>> and pasted
>> > from last year, and I'd rather have actual confirmation.
>> >
>> > Topics:
>> > DPGMM / VBGMM: need to be reimplemented using more standard
>> > variational updates. The GMM is actually fine atm (after a
>> couple of
>> > pending PRs)
>> >
>> > spearmint : Using random forest (they actually use ours) for
>> > hyperparameter optimization. I need to mull this over but I
>> think this
>> > should be easy enough and pretty helpful.
>> >
>> > Online low-rank matrix completion : this is from last year
>> and I'm not
>> > sure if it is still desirable / don't know the state of the PR
>> >
>> > Multiple metric support : This is somewhat API heavy but I
>> think
>> >
>> > PLS/CCA : They need love so very much, but I'm not sure we
>> have a
>> > mentor (if there is one, please speak up!):q
>> >
>> > Ensemble Clusters : Proposed by a possible student (Milton)
>> but I
>> > think it is interesting.
>> >
>> > Semi-Supervised Learning : Meta-estimator for self-taught
>> learning.
>> > Not sure if there is actually much demand for it, but would
>> be nice.
>> >
>> > Additive models: Proposed by ragv, but I'm actually not
>> that sold. We
>> > could include pyearth, but I'm not sure how valuable the
>> other methods
>> > are. Including a significant amount of algorithms just for
>> > completeness is not something I feel great about.
>> >
>> >
>> > That being said, ragv has put in a tremendous amount of
>> great work and
>> > I feel we should definitely find a project for him (as he seems
>> > interested).
>> >
>> >
>> > Things that I think shouldn't be GSOC projects:
>> >
>> > GPs : Jan Hendrik is doing an awesome job there.
>> > MLP : Will be finished soon, either by me or possibly by ragv
>> > data-independent cross-validation : already a bunch of
>> people working
>> > on that, I don't think we should make it GSOC.
>> >
>> > Feedback welcome.
>> >
>> > Andy
>> >
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel
>> Website, sponsored
>> by Intel and developed in partnership with Slashdot Media, is
>> your hub for all
>> things parallel software development, from weekly thought
>> leadership blogs to
>> news, videos, case studies, tutorials and more. Take a look
>> and join the
>> conversation now. http://goparallel.sourceforge.net/
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub for all
>> things parallel software development, from weekly thought leadership blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now.http://goparallel.sourceforge.net/
>>
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net <mailto:Scikit-learn-***@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel
> Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your
> hub for all
> things parallel software development, from weekly thought
> leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and
> join the
> conversation now. http://goparallel.sourceforge.net/
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> <mailto:Scikit-learn-***@lists.sourceforge.net>
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Arnaud Joly
2015-03-06 08:42:50 UTC
Permalink
Hi,

Sadly this year, I won’t have time for mentoring.
However, I will try to find some spare time for reviewing!

Best regards,
Arnaud



> On 05 Mar 2015, at 22:43, Andreas Mueller <***@gmail.com> wrote:
>
> Hi Wei Xue.
> Thanks for your interest.
> For the GMM project being familiar with DPGMM and VB should be enough.
> We don't want to use Gibbs sampling in the DP. If you feel comfortable implementing
> a given derivation and have some understanding, that should be fine.
>
> For hyper-parameter optimization, the idea would be to implement our own version based on
> our tree implementation (which is actually also done in spearmint) or using the new GP.
>
> HTH,
> Andreas
>
> On 03/05/2015 04:32 PM, Wei Xue wrote:
>> Hi, all
>>
>> I am a graduate student studying machine learning, and will probably apply GSOC project this year. I just took a loot at the wiki, and found two interesting topics for me.
>> Improve GMM
>> Global optimization based Hyper-parameter optimization
>> For the GMM topic, I studied DP years ago, and implemented a toy DPGMM using Gibbs sampling on Matlab. I am also familiar with VB. My question is that does gsoc projects require students fully understand the theory of DP?
>>
>> For the hyper-parameter optimization topic, since there are already two python packages spearmint and Hyperopt, the goal of this topic is to implement our own modules or to build interfaces for other packages?
>>
>>
>> Thanks,
>> Wei Xue
>>
>>
>> 2015-03-05 14:25 GMT-05:00 Andreas Mueller <***@gmail.com <mailto:***@gmail.com>>:
>> Thanks for volunteering to assist, I updated the wiki accordingly :)
>>
>>
>>
>> On 03/05/2015 01:21 PM, Michael Eickenberg wrote:
>>> I unfortunately cannot lead any gsoc project this year, but can help out with code review and mentoring if sb else takes the lead. The two projects I can be of use for are CCA/PLS rethinking and additive models.
>>>
>>> Michael
>>>
>>> On Thursday, March 5, 2015, Andreas Mueller <***@gmail.com <mailto:***@gmail.com>> wrote:
>>> Can all would-be mentors please register on Melange?
>>> The list of possible mentors lists Arnaud, probably a C&P from last year.
>>> Arnaud, are you up for mentoring again? Otherwise I'll remove you from
>>> the list.
>>>
>>> Then we'd currently have
>>>
>>> Gaël Varoquaux (not sure if you have time?), Vlad Niculae, Olivier
>>> Grisel,Alexandre Gramfort, Michael Eickenberg
>>> and me.
>>> Any other volunteers?
>>>
>>>
>>>
>>>
>>> On 02/24/2015 09:48 AM, Andy wrote:
>>> > Hey Everybody.
>>> >
>>> > Here is my somewhat consolidated list of ideas with minor comments.
>>> > If anything is missing, please let me know. Also, I don't think people
>>> > who want to mentor spoke up yet.
>>> > I'll remove all people listed on the wiki as they were copy and pasted
>>> > from last year, and I'd rather have actual confirmation.
>>> >
>>> > Topics:
>>> > DPGMM / VBGMM: need to be reimplemented using more standard
>>> > variational updates. The GMM is actually fine atm (after a couple of
>>> > pending PRs)
>>> >
>>> > spearmint : Using random forest (they actually use ours) for
>>> > hyperparameter optimization. I need to mull this over but I think this
>>> > should be easy enough and pretty helpful.
>>> >
>>> > Online low-rank matrix completion : this is from last year and I'm not
>>> > sure if it is still desirable / don't know the state of the PR
>>> >
>>> > Multiple metric support : This is somewhat API heavy but I think
>>> >
>>> > PLS/CCA : They need love so very much, but I'm not sure we have a
>>> > mentor (if there is one, please speak up!):q
>>> >
>>> > Ensemble Clusters : Proposed by a possible student (Milton) but I
>>> > think it is interesting.
>>> >
>>> > Semi-Supervised Learning : Meta-estimator for self-taught learning.
>>> > Not sure if there is actually much demand for it, but would be nice.
>>> >
>>> > Additive models: Proposed by ragv, but I'm actually not that sold. We
>>> > could include pyearth, but I'm not sure how valuable the other methods
>>> > are. Including a significant amount of algorithms just for
>>> > completeness is not something I feel great about.
>>> >
>>> >
>>> > That being said, ragv has put in a tremendous amount of great work and
>>> > I feel we should definitely find a project for him (as he seems
>>> > interested).
>>> >
>>> >
>>> > Things that I think shouldn't be GSOC projects:
>>> >
>>> > GPs : Jan Hendrik is doing an awesome job there.
>>> > MLP : Will be finished soon, either by me or possibly by ragv
>>> > data-independent cross-validation : already a bunch of people working
>>> > on that, I don't think we should make it GSOC.
>>> >
>>> > Feedback welcome.
>>> >
>>> > Andy
>>> >
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
>>> by Intel and developed in partnership with Slashdot Media, is your hub for all
>>> things parallel software development, from weekly thought leadership blogs to
>>> news, videos, case studies, tutorials and more. Take a look and join the
>>> conversation now. http://goparallel.sourceforge.net/ <http://goparallel.sourceforge.net/>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net <>
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
>>> by Intel and developed in partnership with Slashdot Media, is your hub for all
>>> things parallel software development, from weekly thought leadership blogs to
>>> news, videos, case studies, tutorials and more. Take a look and join the
>>> conversation now. http://goparallel.sourceforge.net/ <http://goparallel.sourceforge.net/>
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net <mailto:Scikit-learn-***@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub for all
>> things parallel software development, from weekly thought leadership blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/ <http://goparallel.sourceforge.net/>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net <mailto:Scikit-learn-***@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
>>
>>
>>
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
>> by Intel and developed in partnership with Slashdot Media, is your hub for all
>> things parallel software development, from weekly thought leadership blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now. http://goparallel.sourceforge.net/ <http://goparallel.sourceforge.net/>
>>
>> _______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net <mailto:Scikit-learn-***@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general <https://lists.sourceforge.net/lists/listinfo/scikit-learn-general>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/_______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2015-03-06 17:05:40 UTC
Permalink
Thanks for trying to make some time :)


On 03/06/2015 03:42 AM, Arnaud Joly wrote:
> Hi,
>
> Sadly this year, I won’t have time for mentoring.
> However, I will try to find some spare time for reviewing!
>
> Best regards,
> Arnaud
>
>
>
>> On 05 Mar 2015, at 22:43, Andreas Mueller <***@gmail.com
>> <mailto:***@gmail.com>> wrote:
>>
>> Hi Wei Xue.
>> Thanks for your interest.
>> For the GMM project being familiar with DPGMM and VB should be enough.
>> We don't want to use Gibbs sampling in the DP. If you feel
>> comfortable implementing
>> a given derivation and have some understanding, that should be fine.
>>
>> For hyper-parameter optimization, the idea would be to implement our
>> own version based on
>> our tree implementation (which is actually also done in spearmint) or
>> using the new GP.
>>
>> HTH,
>> Andreas
>>
>> On 03/05/2015 04:32 PM, Wei Xue wrote:
>>> Hi, all
>>>
>>> I am a graduate student studying machine learning, and will probably
>>> apply GSOC project this year. I just took a loot at the wiki, and
>>> found two interesting topics for me.
>>>
>>> * Improve GMM
>>> * Global optimization based Hyper-parameter optimization
>>>
>>> For the GMM topic, I studied DP years ago, and implemented a toy
>>> DPGMM using Gibbs sampling on Matlab. I am also familiar with VB. My
>>> question is that does gsoc projects require students fully
>>> understand the theory of DP?
>>>
>>> For the hyper-parameter optimization topic, since there are already
>>> two python packages /spearmint/ and /Hyperopt, /the goal of this
>>> topic is to implement our own modules or to build interfaces for
>>> other packages?
>>>
>>>
>>> Thanks,
>>> Wei Xue
>>>
>>>
>>> 2015-03-05 14:25 GMT-05:00 Andreas Mueller <***@gmail.com
>>> <mailto:***@gmail.com>>:
>>>
>>> Thanks for volunteering to assist, I updated the wiki
>>> accordingly :)
>>>
>>>
>>>
>>> On 03/05/2015 01:21 PM, Michael Eickenberg wrote:
>>>> I unfortunately cannot lead any gsoc project this year, but can
>>>> help out with code review and mentoring if sb else takes the
>>>> lead. The two projects I can be of use for are CCA/PLS
>>>> rethinking and additive models.
>>>>
>>>> Michael
>>>>
>>>> On Thursday, March 5, 2015, Andreas Mueller <***@gmail.com
>>>> <mailto:***@gmail.com>> wrote:
>>>>
>>>> Can all would-be mentors please register on Melange?
>>>> The list of possible mentors lists Arnaud, probably a C&P
>>>> from last year.
>>>> Arnaud, are you up for mentoring again? Otherwise I'll
>>>> remove you from
>>>> the list.
>>>>
>>>> Then we'd currently have
>>>>
>>>> Gaël Varoquaux (not sure if you have time?), Vlad Niculae,
>>>> Olivier
>>>> Grisel,Alexandre Gramfort, Michael Eickenberg
>>>> and me.
>>>> Any other volunteers?
>>>>
>>>>
>>>>
>>>>
>>>> On 02/24/2015 09:48 AM, Andy wrote:
>>>> > Hey Everybody.
>>>> >
>>>> > Here is my somewhat consolidated list of ideas with minor
>>>> comments.
>>>> > If anything is missing, please let me know. Also, I don't
>>>> think people
>>>> > who want to mentor spoke up yet.
>>>> > I'll remove all people listed on the wiki as they were
>>>> copy and pasted
>>>> > from last year, and I'd rather have actual confirmation.
>>>> >
>>>> > Topics:
>>>> > DPGMM / VBGMM: need to be reimplemented using more standard
>>>> > variational updates. The GMM is actually fine atm (after
>>>> a couple of
>>>> > pending PRs)
>>>> >
>>>> > spearmint : Using random forest (they actually use ours) for
>>>> > hyperparameter optimization. I need to mull this over but
>>>> I think this
>>>> > should be easy enough and pretty helpful.
>>>> >
>>>> > Online low-rank matrix completion : this is from last
>>>> year and I'm not
>>>> > sure if it is still desirable / don't know the state of
>>>> the PR
>>>> >
>>>> > Multiple metric support : This is somewhat API heavy but
>>>> I think
>>>> >
>>>> > PLS/CCA : They need love so very much, but I'm not sure
>>>> we have a
>>>> > mentor (if there is one, please speak up!):q
>>>> >
>>>> > Ensemble Clusters : Proposed by a possible student
>>>> (Milton) but I
>>>> > think it is interesting.
>>>> >
>>>> > Semi-Supervised Learning : Meta-estimator for self-taught
>>>> learning.
>>>> > Not sure if there is actually much demand for it, but
>>>> would be nice.
>>>> >
>>>> > Additive models: Proposed by ragv, but I'm actually not
>>>> that sold. We
>>>> > could include pyearth, but I'm not sure how valuable the
>>>> other methods
>>>> > are. Including a significant amount of algorithms just for
>>>> > completeness is not something I feel great about.
>>>> >
>>>> >
>>>> > That being said, ragv has put in a tremendous amount of
>>>> great work and
>>>> > I feel we should definitely find a project for him (as he
>>>> seems
>>>> > interested).
>>>> >
>>>> >
>>>> > Things that I think shouldn't be GSOC projects:
>>>> >
>>>> > GPs : Jan Hendrik is doing an awesome job there.
>>>> > MLP : Will be finished soon, either by me or possibly by ragv
>>>> > data-independent cross-validation : already a bunch of
>>>> people working
>>>> > on that, I don't think we should make it GSOC.
>>>> >
>>>> > Feedback welcome.
>>>> >
>>>> > Andy
>>>> >
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Dive into the World of Parallel Programming The Go Parallel
>>>> Website, sponsored
>>>> by Intel and developed in partnership with Slashdot Media,
>>>> is your hub for all
>>>> things parallel software development, from weekly thought
>>>> leadership blogs to
>>>> news, videos, case studies, tutorials and more. Take a look
>>>> and join the
>>>> conversation now. http://goparallel.sourceforge.net/
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-***@lists.sourceforge.net
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------
>>>> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
>>>> by Intel and developed in partnership with Slashdot Media, is your hub for all
>>>> things parallel software development, from weekly thought leadership blogs to
>>>> news, videos, case studies, tutorials and more. Take a look and join the
>>>> conversation now.http://goparallel.sourceforge.net/
>>>>
>>>>
>>>> _______________________________________________
>>>> Scikit-learn-general mailing list
>>>> Scikit-learn-***@lists.sourceforge.net <mailto:Scikit-learn-***@lists.sourceforge.net>
>>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming The Go Parallel
>>> Website, sponsored
>>> by Intel and developed in partnership with Slashdot Media, is
>>> your hub for all
>>> things parallel software development, from weekly thought
>>> leadership blogs to
>>> news, videos, case studies, tutorials and more. Take a look and
>>> join the
>>> conversation now. http://goparallel.sourceforge.net/
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> <mailto:Scikit-learn-***@lists.sourceforge.net>
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------------------------
>>> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
>>> by Intel and developed in partnership with Slashdot Media, is your hub for all
>>> things parallel software development, from weekly thought leadership blogs to
>>> news, videos, case studies, tutorials and more. Take a look and join the
>>> conversation now.http://goparallel.sourceforge.net/
>>>
>>>
>>> _______________________________________________
>>> Scikit-learn-general mailing list
>>> Scikit-learn-***@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>>
>> ------------------------------------------------------------------------------
>> Dive into the World of Parallel Programming The Go Parallel Website,
>> sponsored
>> by Intel and developed in partnership with Slashdot Media, is your
>> hub for all
>> things parallel software development, from weekly thought leadership
>> blogs to
>> news, videos, case studies, tutorials and more. Take a look and join the
>> conversation now.
>> http://goparallel.sourceforge.net/_______________________________________________
>> Scikit-learn-general mailing list
>> Scikit-learn-***@lists.sourceforge.net
>> <mailto:Scikit-learn-***@lists.sourceforge.net>
>> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------
> Dive into the World of Parallel Programming The Go Parallel Website, sponsored
> by Intel and developed in partnership with Slashdot Media, is your hub for all
> things parallel software development, from weekly thought leadership blogs to
> news, videos, case studies, tutorials and more. Take a look and join the
> conversation now. http://goparallel.sourceforge.net/
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Continue reading on narkive:
Loading...