Discussion:
Question regarding the list of topics for GSoC 2015
(too old to reply)
Vinayak Mehta
2015-03-10 02:48:29 UTC
Permalink
Hello everyone!

I'm Vinayak Mehta, an undergraduate student of computer science at Bharati
Vidyapeeth's College of Engineering, Delhi.

Since the list is not definitive, I would like to ask if the topic "Online
Low Rank Matrix Completion" which was there in the previous revisions of
the list, will be added again by any chance?

The reason being it needs a scalable recommender system example and I am
somewhat familiar with building a recommender system as I'm implementing
one as my college mini project. I'm also familiar with the MovieLens
dataset as I've built a small recommender system using it.

If it will not be added, then I'll start working to understand the other
two ideas which I think I'm interested in, "Global optimization based
Hyperparameter optimization" and
"Multiple metric support for cross-validation and gridsearches".

Cheers!
Vinayak Mehta (vortex_ape on freenode)
Andreas Mueller
2015-03-23 21:11:59 UTC
Permalink
Hi Vinayak.
Have you decided on your application topic?
I am trying to get a bit of an overview, and I think you haven't
submitted anything yet.
There are two other applications for the hyperparameter topic and one
for the cross-validation and gridsearch improvements.
Since Ragv is already working on cross-validation, we might prefer to
give him the topic.

I have not looked at the hyperparameter proposals in detail, and it is
certainly fair game to put in another one.
You did a fair amount of work in the last couple days, so I'd be happy
to see a good proposal from you ;)

I updated
https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-%28GSOC%29-2015
to reflect the current proposal status.

Cheers,
Andy
Post by Vinayak Mehta
Hello everyone!
I'm Vinayak Mehta, an undergraduate student of computer science at
Bharati Vidyapeeth's College of Engineering, Delhi.
Since the list is not definitive, I would like to ask if the topic
"Online Low Rank Matrix Completion" which was there in the previous
revisions of the list, will be added again by any chance?
The reason being it needs a scalable recommender system example and I
am somewhat familiar with building a recommender system as I'm
implementing one as my college mini project. I'm also familiar with
the MovieLens dataset as I've built a small recommender system using it.
If it will not be added, then I'll start working to understand the
other two ideas which I think I'm interested in, "Global optimization
based Hyperparameter optimization" and
"Multiple metric support for cross-validation and gridsearches".
Cheers!
Vinayak Mehta (vortex_ape on freenode)
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Vlad Niculae
2015-03-24 00:23:56 UTC
Permalink
Hi Vinayak,

The wiki page just lists a subset of possible topics for which candidates already showed concrete interest. I think an application for low-rank matrix completion would be more than welcome. It’s very important to work on a topic that you are interested in directly, versus just picking something from a list.

As Andy said, you should submit a proposal soon, so we can discuss and give you feedback. Some first important (but by no means complete) notes:

* RecSys and matrix completion have some overlap, but they are different (and neither includes the other). I would welcome a matrix completion proposal, but RecSys are a specific end-to-end application that I believe is out of scope for scikit-learn.
* I would emphasize the review of the established state-of-the art algorithms (including links to the papers and citation counts).
* A batch matrix completion method (such as what the R softImpute package uses [1]) could have desirable advantages for scikit-learn inclusion (namely, using it for imputation, given the likely use case that the data fits in memory).
* Such a proposal has complications from an API, metrics and cross-validation point of view, these should be discussed.

Looking forward to your proposal!

Yours,
Vlad

.[1] http://web.stanford.edu/~hastie/swData/softImpute/vignette.html
Post by Andreas Mueller
Hi Vinayak.
Have you decided on your application topic?
I am trying to get a bit of an overview, and I think you haven't submitted anything yet.
There are two other applications for the hyperparameter topic and one for the cross-validation and gridsearch improvements.
Since Ragv is already working on cross-validation, we might prefer to give him the topic.
I have not looked at the hyperparameter proposals in detail, and it is certainly fair game to put in another one.
You did a fair amount of work in the last couple days, so I'd be happy to see a good proposal from you ;)
I updated https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-%28GSOC%29-2015 to reflect the current proposal status.
Cheers,
Andy
Post by Vinayak Mehta
Hello everyone!
I'm Vinayak Mehta, an undergraduate student of computer science at Bharati Vidyapeeth's College of Engineering, Delhi.
Since the list is not definitive, I would like to ask if the topic "Online Low Rank Matrix Completion" which was there in the previous revisions of the list, will be added again by any chance?
The reason being it needs a scalable recommender system example and I am somewhat familiar with building a recommender system as I'm implementing one as my college mini project. I'm also familiar with the MovieLens dataset as I've built a small recommender system using it.
If it will not be added, then I'll start working to understand the other two ideas which I think I'm interested in, "Global optimization based Hyperparameter optimization" and
"Multiple metric support for cross-validation and gridsearches".
Cheers!
Vinayak Mehta (vortex_ape on freenode)
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.
http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Artem
2015-03-24 00:33:56 UTC
Permalink
It's worth noting that there was a similar project
<https://github.com/scikit-learn/scikit-learn/pull/2387> 2 years ago, but
unfortunately it wasn't completed. I made some work upon that, but I didn't
get any feedback.
Post by Vlad Niculae
Hi Vinayak,
The wiki page just lists a subset of possible topics for which candidates
already showed concrete interest. I think an application for low-rank
matrix completion would be more than welcome. It’s very important to work
on a topic that you are interested in directly, versus just picking
something from a list.
As Andy said, you should submit a proposal soon, so we can discuss and
* RecSys and matrix completion have some overlap, but they are different
(and neither includes the other). I would welcome a matrix completion
proposal, but RecSys are a specific end-to-end application that I believe
is out of scope for scikit-learn.
* I would emphasize the review of the established state-of-the art
algorithms (including links to the papers and citation counts).
* A batch matrix completion method (such as what the R softImpute package
uses [1]) could have desirable advantages for scikit-learn inclusion
(namely, using it for imputation, given the likely use case that the data
fits in memory).
* Such a proposal has complications from an API, metrics and
cross-validation point of view, these should be discussed.
Looking forward to your proposal!
Yours,
Vlad
.[1] http://web.stanford.edu/~hastie/swData/softImpute/vignette.html
Post by Andreas Mueller
Hi Vinayak.
Have you decided on your application topic?
I am trying to get a bit of an overview, and I think you haven't
submitted anything yet.
Post by Andreas Mueller
There are two other applications for the hyperparameter topic and one
for the cross-validation and gridsearch improvements.
Post by Andreas Mueller
Since Ragv is already working on cross-validation, we might prefer to
give him the topic.
Post by Andreas Mueller
I have not looked at the hyperparameter proposals in detail, and it is
certainly fair game to put in another one.
Post by Andreas Mueller
You did a fair amount of work in the last couple days, so I'd be happy
to see a good proposal from you ;)
Post by Andreas Mueller
I updated
https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-%28GSOC%29-2015
to reflect the current proposal status.
Post by Andreas Mueller
Cheers,
Andy
Post by Vinayak Mehta
Hello everyone!
I'm Vinayak Mehta, an undergraduate student of computer science at
Bharati Vidyapeeth's College of Engineering, Delhi.
Post by Andreas Mueller
Post by Vinayak Mehta
Since the list is not definitive, I would like to ask if the topic
"Online Low Rank Matrix Completion" which was there in the previous
revisions of the list, will be added again by any chance?
Post by Andreas Mueller
Post by Vinayak Mehta
The reason being it needs a scalable recommender system example and I
am somewhat familiar with building a recommender system as I'm implementing
one as my college mini project. I'm also familiar with the MovieLens
dataset as I've built a small recommender system using it.
Post by Andreas Mueller
Post by Vinayak Mehta
If it will not be added, then I'll start working to understand the
other two ideas which I think I'm interested in, "Global optimization based
Hyperparameter optimization" and
Post by Andreas Mueller
Post by Vinayak Mehta
"Multiple metric support for cross-validation and gridsearches".
Cheers!
Vinayak Mehta (vortex_ape on freenode)
------------------------------------------------------------------------------
Post by Andreas Mueller
Post by Vinayak Mehta
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Andreas Mueller
Post by Vinayak Mehta
by Intel and developed in partnership with Slashdot Media, is your hub
for all
Post by Andreas Mueller
Post by Vinayak Mehta
things parallel software development, from weekly thought leadership
blogs to
Post by Andreas Mueller
Post by Vinayak Mehta
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.
http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Andreas Mueller
Dive into the World of Parallel Programming The Go Parallel Website,
sponsored
Post by Andreas Mueller
by Intel and developed in partnership with Slashdot Media, is your hub
for all
Post by Andreas Mueller
things parallel software development, from weekly thought leadership
blogs to
Post by Andreas Mueller
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.
http://goparallel.sourceforge.net/_______________________________________________
Post by Andreas Mueller
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Vlad Niculae
2015-03-24 00:43:43 UTC
Permalink
Very good points, Artem! The PR you link to contains important discussion on API issues. I’m sorry I missed your PR.
It's worth noting that there was a similar project 2 years ago, but unfortunately it wasn't completed. I made some work upon that, but I didn't get any feedback.
Hi Vinayak,
The wiki page just lists a subset of possible topics for which candidates already showed concrete interest. I think an application for low-rank matrix completion would be more than welcome. It’s very important to work on a topic that you are interested in directly, versus just picking something from a list.
* RecSys and matrix completion have some overlap, but they are different (and neither includes the other). I would welcome a matrix completion proposal, but RecSys are a specific end-to-end application that I believe is out of scope for scikit-learn.
* I would emphasize the review of the established state-of-the art algorithms (including links to the papers and citation counts).
* A batch matrix completion method (such as what the R softImpute package uses [1]) could have desirable advantages for scikit-learn inclusion (namely, using it for imputation, given the likely use case that the data fits in memory).
* Such a proposal has complications from an API, metrics and cross-validation point of view, these should be discussed.
Looking forward to your proposal!
Yours,
Vlad
.[1] http://web.stanford.edu/~hastie/swData/softImpute/vignette.html
Post by Andreas Mueller
Hi Vinayak.
Have you decided on your application topic?
I am trying to get a bit of an overview, and I think you haven't submitted anything yet.
There are two other applications for the hyperparameter topic and one for the cross-validation and gridsearch improvements.
Since Ragv is already working on cross-validation, we might prefer to give him the topic.
I have not looked at the hyperparameter proposals in detail, and it is certainly fair game to put in another one.
You did a fair amount of work in the last couple days, so I'd be happy to see a good proposal from you ;)
I updated https://github.com/scikit-learn/scikit-learn/wiki/Google-summer-of-code-%28GSOC%29-2015 to reflect the current proposal status.
Cheers,
Andy
Post by Vinayak Mehta
Hello everyone!
I'm Vinayak Mehta, an undergraduate student of computer science at Bharati Vidyapeeth's College of Engineering, Delhi.
Since the list is not definitive, I would like to ask if the topic "Online Low Rank Matrix Completion" which was there in the previous revisions of the list, will be added again by any chance?
The reason being it needs a scalable recommender system example and I am somewhat familiar with building a recommender system as I'm implementing one as my college mini project. I'm also familiar with the MovieLens dataset as I've built a small recommender system using it.
If it will not be added, then I'll start working to understand the other two ideas which I think I'm interested in, "Global optimization based Hyperparameter optimization" and
"Multiple metric support for cross-validation and gridsearches".
Cheers!
Vinayak Mehta (vortex_ape on freenode)
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.
http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Vinayak Mehta
2015-03-24 06:08:53 UTC
Permalink
@Andy
I was working on the Multiple metric support idea, but since ragv has opted
for it, I am now studying the hyperparameter optimization idea. I am also
quite interested in the self-taught learning one. I will submit a proposal
by the end of the day at GMT+5.5 :)

@Vlad
Thanks for the motivation and info on the low rank matrix completion idea
:) But I think I'm more interested in the above two ideas, will update soon.

Cheers!
Vinayak
Olivier Grisel
2015-03-24 20:31:11 UTC
Permalink
Please send a link to your proposal as a reply to this thread as soon
as it's online.
--
Olivier
Vinayak Mehta
2015-03-25 08:16:39 UTC
Permalink
Hi everyone!

I've added my proposal to the wiki page. Please suggest improvements. Here
is a link to the Google doc:
https://docs.google.com/document/d/1JCbeakBtPTpfis2grw00I8Y1VVivssAdiHlm1ejS3E8/edit?usp=sharing

Further, I want to discuss on if this ->
http://www.machinelearning.org/archive/icml2008/papers/432.pdf could be
added to my proposal.

Thanks,
Vinayak
Andreas Mueller
2015-03-25 19:35:39 UTC
Permalink
Hi Vinayak.
That looks more like a transfer-learning task and I'm not sure how that
a) tie into the project b) work with the sklearn API.
So I'd be -1 on that.

Cheers,
Andy
Post by Vinayak Mehta
Hi everyone!
I've added my proposal to the wiki page. Please suggest improvements.
https://docs.google.com/document/d/1JCbeakBtPTpfis2grw00I8Y1VVivssAdiHlm1ejS3E8/edit?usp=sharing
Further, I want to discuss on if this ->
http://www.machinelearning.org/archive/icml2008/papers/432.pdf could
be added to my proposal.
Thanks,
Vinayak
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Vinayak Mehta
2015-03-25 20:01:04 UTC
Permalink
Hi Andy

The idea wiki showed issue #1243 as a reference link which specifically
mentions self-taught learning as a solution for turning an estimator into a
semi-supervised one. So, I tried to base my proposal on that. Could you
guide me on how to focus more on semi-supervised learning than transfer
learning by commenting on specific places in the doc. :) And maybe provide
some points on where I can improve it as it is somewhat abstract right now
I think.

Thanks,
Vinayak
Post by Andreas Mueller
Hi Vinayak.
That looks more like a transfer-learning task and I'm not sure how that a)
tie into the project b) work with the sklearn API.
So I'd be -1 on that.
Cheers,
Andy
Hi everyone!
I've added my proposal to the wiki page. Please suggest improvements.
https://docs.google.com/document/d/1JCbeakBtPTpfis2grw00I8Y1VVivssAdiHlm1ejS3E8/edit?usp=sharing
Further, I want to discuss on if this ->
http://www.machinelearning.org/archive/icml2008/papers/432.pdf could be
added to my proposal.
Thanks,
Vinayak
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2015-03-25 20:09:09 UTC
Permalink
Hi Vinayak.
I was specifically commenting about the self-taught clustering paper
that you mentioned in your email.
Sorry about not being specific.

Best,
Andy
Post by Vinayak Mehta
Hi Andy
The idea wiki showed issue #1243 as a reference link which
specifically mentions self-taught learning as a solution for turning
an estimator into a semi-supervised one. So, I tried to base my
proposal on that. Could you guide me on how to focus more on
semi-supervised learning than transfer learning by commenting on
specific places in the doc. :) And maybe provide some points on where
I can improve it as it is somewhat abstract right now I think.
Thanks,
Vinayak
Hi Vinayak.
That looks more like a transfer-learning task and I'm not sure how
that a) tie into the project b) work with the sklearn API.
So I'd be -1 on that.
Cheers,
Andy
Post by Vinayak Mehta
Hi everyone!
I've added my proposal to the wiki page. Please suggest
https://docs.google.com/document/d/1JCbeakBtPTpfis2grw00I8Y1VVivssAdiHlm1ejS3E8/edit?usp=sharing
Further, I want to discuss on if this ->
http://www.machinelearning.org/archive/icml2008/papers/432.pdf
could be added to my proposal.
Thanks,
Vinayak
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Vinayak Mehta
2015-03-25 20:12:22 UTC
Permalink
What do you think about the proposal though?

Vinayak
Post by Andreas Mueller
Hi Vinayak.
I was specifically commenting about the self-taught clustering paper that
you mentioned in your email.
Sorry about not being specific.
Best,
Andy
Hi Andy
The idea wiki showed issue #1243 as a reference link which specifically
mentions self-taught learning as a solution for turning an estimator into a
semi-supervised one. So, I tried to base my proposal on that. Could you
guide me on how to focus more on semi-supervised learning than transfer
learning by commenting on specific places in the doc. :) And maybe provide
some points on where I can improve it as it is somewhat abstract right now
I think.
Thanks,
Vinayak
Post by Andreas Mueller
Hi Vinayak.
That looks more like a transfer-learning task and I'm not sure how that
a) tie into the project b) work with the sklearn API.
So I'd be -1 on that.
Cheers,
Andy
Hi everyone!
I've added my proposal to the wiki page. Please suggest improvements.
https://docs.google.com/document/d/1JCbeakBtPTpfis2grw00I8Y1VVivssAdiHlm1ejS3E8/edit?usp=sharing
Further, I want to discuss on if this ->
http://www.machinelearning.org/archive/icml2008/papers/432.pdf could be
added to my proposal.
Thanks,
Vinayak
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2015-03-25 21:11:33 UTC
Permalink
Sorry for the confusion, but that was actually not the meta-estimator I
was thinking of.
I was thinking about the iterative self-learning method, which is a
classical way to make a supervised algorithm semi-supervised.
Either way, these would be quite simple meta-estimators, and wouldn't
require any new algorithms.
[What you explained in your proposal is basically

LinearSVC().fit(DictionaryLearning().fit(X_unlabeled).transform(X_train), y_train)]

Therefore I think they are not enough meat for a whole GSoC.
Are there other infrastructure things that need to change for
semi-supervised learning to become a first-class citizen in sklearn?
If not, maybe it would be worth adding another algorithm, such as
transductive SVMs?

Best,
Andy
Post by Vinayak Mehta
What do you think about the proposal though?
Vinayak
Hi Vinayak.
I was specifically commenting about the self-taught clustering
paper that you mentioned in your email.
Sorry about not being specific.
Best,
Andy
Post by Vinayak Mehta
Hi Andy
The idea wiki showed issue #1243 as a reference link which
specifically mentions self-taught learning as a solution for
turning an estimator into a semi-supervised one. So, I tried to
base my proposal on that. Could you guide me on how to focus more
on semi-supervised learning than transfer learning by commenting
on specific places in the doc. :) And maybe provide some points
on where I can improve it as it is somewhat abstract right now I
think.
Thanks,
Vinayak
On Thu, Mar 26, 2015 at 1:05 AM, Andreas Mueller
Hi Vinayak.
That looks more like a transfer-learning task and I'm not
sure how that a) tie into the project b) work with the
sklearn API.
So I'd be -1 on that.
Cheers,
Andy
Post by Vinayak Mehta
Hi everyone!
I've added my proposal to the wiki page. Please suggest
https://docs.google.com/document/d/1JCbeakBtPTpfis2grw00I8Y1VVivssAdiHlm1ejS3E8/edit?usp=sharing
Further, I want to discuss on if this ->
http://www.machinelearning.org/archive/icml2008/papers/432.pdf
could be added to my proposal.
Thanks,
Vinayak
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel
Website, sponsored
by Intel and developed in partnership with Slashdot Media, is
your hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now.http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought
leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Vinayak Mehta
2015-03-26 07:03:02 UTC
Permalink
Sorry for the late reply, my internet connection failed me. I've seen only
cross validation being the problem for semi-supervised learning, on the
issue tracker. Would someone else like to discuss about this?

So, should I scrap the self-taught learning algorithm from the proposal?
Also, I'm looking into transductive SVMs.

Vinayak
Post by Andreas Mueller
Sorry for the confusion, but that was actually not the meta-estimator I
was thinking of.
I was thinking about the iterative self-learning method, which is a
classical way to make a supervised algorithm semi-supervised.
Either way, these would be quite simple meta-estimators, and wouldn't
require any new algorithms.
[What you explained in your proposal is basically
LinearSVC().fit(DictionaryLearning().fit(X_unlabeled).transform(X_train), y_train)]
Therefore I think they are not enough meat for a whole GSoC.
Are there other infrastructure things that need to change for
semi-supervised learning to become a first-class citizen in sklearn?
If not, maybe it would be worth adding another algorithm, such as
transductive SVMs?
Best,
Andy
What do you think about the proposal though?
Vinayak
Post by Andreas Mueller
Hi Vinayak.
I was specifically commenting about the self-taught clustering paper that
you mentioned in your email.
Sorry about not being specific.
Best,
Andy
Hi Andy
The idea wiki showed issue #1243 as a reference link which specifically
mentions self-taught learning as a solution for turning an estimator into a
semi-supervised one. So, I tried to base my proposal on that. Could you
guide me on how to focus more on semi-supervised learning than transfer
learning by commenting on specific places in the doc. :) And maybe provide
some points on where I can improve it as it is somewhat abstract right now
I think.
Thanks,
Vinayak
Post by Andreas Mueller
Hi Vinayak.
That looks more like a transfer-learning task and I'm not sure how that
a) tie into the project b) work with the sklearn API.
So I'd be -1 on that.
Cheers,
Andy
Hi everyone!
I've added my proposal to the wiki page. Please suggest improvements.
https://docs.google.com/document/d/1JCbeakBtPTpfis2grw00I8Y1VVivssAdiHlm1ejS3E8/edit?usp=sharing
Further, I want to discuss on if this ->
http://www.machinelearning.org/archive/icml2008/papers/432.pdf could be
added to my proposal.
Thanks,
Vinayak
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
------------------------------------------------------------------------------
Dive into the World of Parallel Programming The Go Parallel Website, sponsored
by Intel and developed in partnership with Slashdot Media, is your hub for all
things parallel software development, from weekly thought leadership blogs to
news, videos, case studies, tutorials and more. Take a look and join the
conversation now. http://goparallel.sourceforge.net/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Continue reading on narkive:
Loading...