Discussion:
[Scikit-learn-general] GSOC idea
Şükrü Bezen
2013-04-15 12:19:01 UTC
Permalink
Hello,

My name is ŞÌkrÌ BEZEN and I am having my MSc. degree from METU Computer
Engineering in the topic of Recommendation Systems.

I would like to implement the core recommendation systems algorithms into
the scikit-learn. That would include collaborative filtering, content
filtering and some hybrid models. That would include creating classes,
APIs, clean documentation and etc.

What do you think about this?
--
--------------------------------------------------
ŞÌkrÌ BEZEN
Andreas Mueller
2013-04-15 12:53:21 UTC
Permalink
Hi S,ükrü.
I think this is an awesome idea.
Finding a good mentor might be a problem, though. Any takes?
Also, I wouldn't set the goals to high. Having a good api that works for
many applications
and solid and efficient implementation of one or two core techniques
would go a long way,
and imho would benefit the project more than adding many but doing them
in a rush.

Cheers,
Andy
Post by Şükrü Bezen
Hello,
My name is S,ükrü BEZEN and I am having my MSc. degree from METU
Computer Engineering in the topic of Recommendation Systems.
I would like to implement the core recommendation systems algorithms
into the scikit-learn. That would include collaborative filtering,
content filtering and some hybrid models. That would include creating
classes, APIs, clean documentation and etc.
What do you think about this?
--
--------------------------------------------------
S,ükrü BEZEN
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Olivier Grisel
2013-04-15 13:45:00 UTC
Permalink
Hi Şükrü.
I think this is an awesome idea.
Finding a good mentor might be a problem, though. Any takes?
Also, I wouldn't set the goals to high. Having a good api that works for
many applications
and solid and efficient implementation of one or two core techniques would
go a long way,
and imho would benefit the project more than adding many but doing them in a
rush.
Also I would rather avoid adding fancy new application specific public
API just for the recsys use case. Especially before the 1.0 release.
If we can stick to the existing public fit / transform / predict API
(using scipy.sparse matrices), then fine. Otherwise that might cause
trouble.


--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Şükrü Bezen
2013-04-15 20:38:01 UTC
Permalink
Thanks for your precious feedbacks.
I am not considering to implement any fancy public APIs at least not before
finishing the core part with the existing APIs.

Any mentor ideas for this idea?
Post by Olivier Grisel
Hi ŞÌkrÌ.
I think this is an awesome idea.
Finding a good mentor might be a problem, though. Any takes?
Also, I wouldn't set the goals to high. Having a good api that works for
many applications
and solid and efficient implementation of one or two core techniques
would
go a long way,
and imho would benefit the project more than adding many but doing them
in a
rush.
Also I would rather avoid adding fancy new application specific public
API just for the recsys use case. Especially before the 1.0 release.
If we can stick to the existing public fit / transform / predict API
(using scipy.sparse matrices), then fine. Otherwise that might cause
trouble.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
--------------------------------------------------
ŞÌkrÌ BEZEN
Mathieu Blondel
2013-04-16 23:43:10 UTC
Permalink
On Mon, Apr 15, 2013 at 10:45 PM, Olivier Grisel
Post by Olivier Grisel
Also I would rather avoid adding fancy new application specific public
API just for the recsys use case. Especially before the 1.0 release.
If we can stick to the existing public fit / transform / predict API
(using scipy.sparse matrices), then fine. Otherwise that might cause
trouble.
I mentioned it in another thread but inverse_transform is exactly the
method that we need to impute missing values.

Mathieu
Şükrü Bezen
2013-04-20 08:46:34 UTC
Permalink
I am still looking for a mentor to backup this idea of mine, anyone
interested ?
Post by Mathieu Blondel
Post by Olivier Grisel
Also I would rather avoid adding fancy new application specific public
API just for the recsys use case. Especially before the 1.0 release.
If we can stick to the existing public fit / transform / predict API
(using scipy.sparse matrices), then fine. Otherwise that might cause
trouble.
I mentioned it in another thread but inverse_transform is exactly the
method that we need to impute missing values.
Mathieu
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
--------------------------------------------------
ŞÌkrÌ BEZEN
Vlad Niculae
2013-04-24 03:23:56 UTC
Permalink
Hi Şükrü

We can focus on the proposal now and decide later who is better to
mentor it. I could do it but it is not the thing I would be the best
at mentoring, so to solve the chicken-and-egg problem we can optimize
the decisions jointly when the time comes.

Did you start working on your proposal and on a tentative schedule?
Did you think of what algorithms you will implement?

Also, regarding all of the other points made in the thread: even if
merging into master is a good way to finish a GSoC, there is nothing
wrong with leaving a project in a mergable state, but freezing it
until 1.0 (which hopefully will not be very late!)

Yours,
Vlad
Post by Şükrü Bezen
I am still looking for a mentor to backup this idea of mine, anyone
interested ?
Post by Mathieu Blondel
On Mon, Apr 15, 2013 at 10:45 PM, Olivier Grisel
Post by Olivier Grisel
Also I would rather avoid adding fancy new application specific public
API just for the recsys use case. Especially before the 1.0 release.
If we can stick to the existing public fit / transform / predict API
(using scipy.sparse matrices), then fine. Otherwise that might cause
trouble.
I mentioned it in another thread but inverse_transform is exactly the
method that we need to impute missing values.
Mathieu
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
--------------------------------------------------
Şükrü BEZEN
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Şükrü Bezen
2013-04-24 07:16:14 UTC
Permalink
Hi Vlad,

It looks good for me to focus on the proposal now and looking into mentor
later.

I am considering collaborative filtering with *user similarity* and *item
similarity*.
And also* association rule learning* for finding out general behaviour of a
user-item group.

I think those two would be good enough as the algorithms for a 3 month
period, what do you think ?

I started my proposal but it is not finished yet, when it is finished (lets
say version-1) i will send it to get a feedback from you.
And about the scheduling part, I am working on that.
Hi ŞÌkrÌ
We can focus on the proposal now and decide later who is better to
mentor it. I could do it but it is not the thing I would be the best
at mentoring, so to solve the chicken-and-egg problem we can optimize
the decisions jointly when the time comes.
Did you start working on your proposal and on a tentative schedule?
Did you think of what algorithms you will implement?
Also, regarding all of the other points made in the thread: even if
merging into master is a good way to finish a GSoC, there is nothing
wrong with leaving a project in a mergable state, but freezing it
until 1.0 (which hopefully will not be very late!)
Yours,
Vlad
Post by Şükrü Bezen
I am still looking for a mentor to backup this idea of mine, anyone
interested ?
Post by Mathieu Blondel
On Mon, Apr 15, 2013 at 10:45 PM, Olivier Grisel
Post by Olivier Grisel
Also I would rather avoid adding fancy new application specific public
API just for the recsys use case. Especially before the 1.0 release.
If we can stick to the existing public fit / transform / predict API
(using scipy.sparse matrices), then fine. Otherwise that might cause
trouble.
I mentioned it in another thread but inverse_transform is exactly the
method that we need to impute missing values.
Mathieu
------------------------------------------------------------------------------
Post by Şükrü Bezen
Post by Mathieu Blondel
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for
building
Post by Şükrü Bezen
Post by Mathieu Blondel
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
--------------------------------------------------
ŞÌkrÌ BEZEN
------------------------------------------------------------------------------
Post by Şükrü Bezen
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for
building
Post by Şükrü Bezen
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
--------------------------------------------------
ŞÌkrÌ BEZEN
Vlad Niculae
2013-04-24 09:58:03 UTC
Permalink
Thank you,

Do you have some references prepared? It would be useful.

I am not sure if what is in my head is correct but I think association
rule learning is interesting and a kind of method that I would like to
see in scikit-learn, as well as finding frequent itemsets. I hope I'm
thinking of the right thing, though. I will use google but it would
be great if you could provide us with the references that you are
reading as well, so we can talk from the same place.

Yours,
Vlad
Post by Şükrü Bezen
Hi Vlad,
It looks good for me to focus on the proposal now and looking into mentor
later.
I am considering collaborative filtering with user similarity and item
similarity.
And also association rule learning for finding out general behaviour of a
user-item group.
I think those two would be good enough as the algorithms for a 3 month
period, what do you think ?
I started my proposal but it is not finished yet, when it is finished (lets
say version-1) i will send it to get a feedback from you.
And about the scheduling part, I am working on that.
Hi Şükrü
We can focus on the proposal now and decide later who is better to
mentor it. I could do it but it is not the thing I would be the best
at mentoring, so to solve the chicken-and-egg problem we can optimize
the decisions jointly when the time comes.
Did you start working on your proposal and on a tentative schedule?
Did you think of what algorithms you will implement?
Also, regarding all of the other points made in the thread: even if
merging into master is a good way to finish a GSoC, there is nothing
wrong with leaving a project in a mergable state, but freezing it
until 1.0 (which hopefully will not be very late!)
Yours,
Vlad
Post by Şükrü Bezen
I am still looking for a mentor to backup this idea of mine, anyone
interested ?
Post by Mathieu Blondel
On Mon, Apr 15, 2013 at 10:45 PM, Olivier Grisel
Post by Olivier Grisel
Also I would rather avoid adding fancy new application specific public
API just for the recsys use case. Especially before the 1.0 release.
If we can stick to the existing public fit / transform / predict API
(using scipy.sparse matrices), then fine. Otherwise that might cause
trouble.
I mentioned it in another thread but inverse_transform is exactly the
method that we need to impute missing values.
Mathieu
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
--------------------------------------------------
Şükrü BEZEN
------------------------------------------------------------------------------
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free account!
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
--------------------------------------------------
Şükrü BEZEN
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Şükrü Bezen
2013-04-28 22:28:09 UTC
Permalink
Hi again,

For collaborative filtering: www.stat.osu.edu/~dmsl/Sarwar_2001.pdf
For association rule learning:
http://rakesh.agrawal-family.com/papers/vldb94apriori.pdf

And as the schedule part:


- Getting familiar with scikit-learn, API structure etc. (1 week)
- Generating, finding datasets for future use. (1-3 days)
- Implementing association rule learning, (1 week)
- Testing, documenting (1 week)
- Implementing collaborative filtering (2 week)
- Testing, documenting (1 week)
- Evaluating the whole process, benchmarks, etc (1 week)

So in total 7 week + a couple of days is my plan for now.
I think timetable is okay when my knowledge on recommendation systems are
considered.
The only thing that I am lacking right now is scikit-learn know-how but I
already started learning, diving in it.

Any feedbacks are welcome !

Ps: I am working on commiting to the scikit-learn phase now.
Post by Vlad Niculae
Thank you,
Do you have some references prepared? It would be useful.
I am not sure if what is in my head is correct but I think association
rule learning is interesting and a kind of method that I would like to
see in scikit-learn, as well as finding frequent itemsets. I hope I'm
thinking of the right thing, though. I will use google but it would
be great if you could provide us with the references that you are
reading as well, so we can talk from the same place.
Yours,
Vlad
Post by Şükrü Bezen
Hi Vlad,
It looks good for me to focus on the proposal now and looking into mentor
later.
I am considering collaborative filtering with user similarity and item
similarity.
And also association rule learning for finding out general behaviour of a
user-item group.
I think those two would be good enough as the algorithms for a 3 month
period, what do you think ?
I started my proposal but it is not finished yet, when it is finished
(lets
Post by Şükrü Bezen
say version-1) i will send it to get a feedback from you.
And about the scheduling part, I am working on that.
Hi ŞÌkrÌ
We can focus on the proposal now and decide later who is better to
mentor it. I could do it but it is not the thing I would be the best
at mentoring, so to solve the chicken-and-egg problem we can optimize
the decisions jointly when the time comes.
Did you start working on your proposal and on a tentative schedule?
Did you think of what algorithms you will implement?
Also, regarding all of the other points made in the thread: even if
merging into master is a good way to finish a GSoC, there is nothing
wrong with leaving a project in a mergable state, but freezing it
until 1.0 (which hopefully will not be very late!)
Yours,
Vlad
Post by Şükrü Bezen
I am still looking for a mentor to backup this idea of mine, anyone
interested ?
On Wed, Apr 17, 2013 at 2:43 AM, Mathieu Blondel <
Post by Mathieu Blondel
On Mon, Apr 15, 2013 at 10:45 PM, Olivier Grisel
Post by Olivier Grisel
Also I would rather avoid adding fancy new application specific
public
Post by Şükrü Bezen
Post by Şükrü Bezen
Post by Mathieu Blondel
Post by Olivier Grisel
API just for the recsys use case. Especially before the 1.0 release.
If we can stick to the existing public fit / transform / predict API
(using scipy.sparse matrices), then fine. Otherwise that might cause
trouble.
I mentioned it in another thread but inverse_transform is exactly the
method that we need to impute missing values.
Mathieu
------------------------------------------------------------------------------
Post by Şükrü Bezen
Post by Şükrü Bezen
Post by Mathieu Blondel
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free
account!
Post by Şükrü Bezen
Post by Şükrü Bezen
Post by Mathieu Blondel
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
--------------------------------------------------
ŞÌkrÌ BEZEN
------------------------------------------------------------------------------
Post by Şükrü Bezen
Post by Şükrü Bezen
Precog is a next-generation analytics platform capable of advanced
analytics on semi-structured data. The platform includes APIs for building
apps and a phenomenal toolset for data science. Developers can use
our toolset for easy data analysis & visualization. Get a free
account!
Post by Şükrü Bezen
Post by Şükrü Bezen
http://www2.precog.com/precogplatform/slashdotnewsletter
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Şükrü Bezen
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt!
http://p.sf.net/sfu/newrelic_d2d_apr
Post by Şükrü Bezen
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
--------------------------------------------------
ŞÌkrÌ BEZEN
------------------------------------------------------------------------------
Post by Şükrü Bezen
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring
service
Post by Şükrü Bezen
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt!
http://p.sf.net/sfu/newrelic_d2d_apr
Post by Şükrü Bezen
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
--------------------------------------------------
ŞÌkrÌ BEZEN
Gael Varoquaux
2013-04-29 06:15:37 UTC
Permalink
• Getting familiar with scikit-learn, API structure etc. (1 week)
• Generating, finding datasets for future use. (1-3 days)
• Implementing association rule learning, (1 week)
• Testing, documenting (1 week)
• Implementing collaborative filtering (2 week)
• Testing, documenting (1 week)
• Evaluating the whole process, benchmarks, etc (1 week)
So in total 7 week + a couple of days is my plan for now.
Looking at that timetable, what comes into my mind is that the first
hard results (association rule learning implemented and mergeable) come
in a bit late. This can be a problem as ideally there should be a
significant contribution merged before the mid-term review.

Cheers,

Gaël
Şükrü Bezen
2013-04-29 09:23:41 UTC
Permalink
Thanks for the feedback.

Actually, implementation and testing of association rule learning can be
finished sooner than what I thought, two week in total, because of its
simplicity so updated schedule would be like:

• Getting familiar with scikit-learn, API structure etc. (1 week)
• Generating, finding datasets for future use. (1-3 days)
• Implementing association rule learning, (1-3 days)
• Testing, documenting (1-3 days)
• Implementing collaborative filtering (2 week)
• Testing, documenting (1 week)
• Evaluating the whole process, benchmarks, etc (2 week)

I decreased the time of implementing and testing of association rule
learning and added that time to the overall evaluation at the end which
suits better to the deadlines I think.




On Mon, Apr 29, 2013 at 9:15 AM, Gael Varoquaux <
Post by Gael Varoquaux
• Getting familiar with scikit-learn, API structure etc. (1 week)
• Generating, finding datasets for future use. (1-3 days)
• Implementing association rule learning, (1 week)
• Testing, documenting (1 week)
• Implementing collaborative filtering (2 week)
• Testing, documenting (1 week)
• Evaluating the whole process, benchmarks, etc (1 week)
So in total 7 week + a couple of days is my plan for now.
Looking at that timetable, what comes into my mind is that the first
hard results (association rule learning implemented and mergeable) come
in a bit late. This can be a problem as ideally there should be a
significant contribution merged before the mid-term review.
Cheers,
Gaël
------------------------------------------------------------------------------
Try New Relic Now & We'll Send You this Cool Shirt
New Relic is the only SaaS-based application performance monitoring service
that delivers powerful full stack analytics. Optimize and monitor your
browser, app, & servers with just a few lines of code. Try New Relic
and get this awesome Nerd Life shirt! http://p.sf.net/sfu/newrelic_d2d_apr
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
--------------------------------------------------
ŞÌkrÌ BEZEN
Loading...