Discussion:
congratulations to Peter and to scikit-learn!
(too old to reply)
Emanuele Olivetti
2012-07-04 22:48:17 UTC
Permalink
Dear All,

As some of you may have already noticed, Peter (Prettenhofer) has
just won a the "Online Product Sales" competition on kaggle.com
beating 365 teams:
http://www.kaggle.com/c/online-sales/leaderboard
The competition was about predicting the monthly online sales of
a product. In my opinion it was a remarkably difficult competition,
so... congratulations!

As far as I understand Peter used scikit-learn and specifically (his)
GradientBoostingRegressor(), in a clever way.

This is an excellent result for him and - surely - a nice one for
scikit-learn.


Emanuele
Olivier Grisel
2012-07-04 23:03:33 UTC
Permalink
Post by Emanuele Olivetti
Dear All,
As some of you may have already noticed, Peter (Prettenhofer) has
just won a the "Online Product Sales" competition on kaggle.com
http://www.kaggle.com/c/online-sales/leaderboard
The competition was about predicting the monthly online sales of
a product. In my opinion it was a remarkably difficult competition,
so... congratulations!
As far as I understand Peter used scikit-learn and specifically (his)
GradientBoostingRegressor(), in a clever way.
This is an excellent result for him and - surely - a nice one for
scikit-learn.
Indeed, great work Peter! This is an amazing feat.

For those interested in the details, see the "Congrats to the Winners"
thread on the forum where Peter and other top competitors give info on
the winning strategies:

http://www.kaggle.com/c/online-sales/forums/t/2135/congrats-to-the-winners

In short: all of the top performers used gradient boosted machines +
feature expansion for dates + refinements and a fined grained model
selection procedure. Apparently is very important to do rigorous
cross-validation on kaggle to avoid selecting over-fitting models on
Kaggle as the top competitors have often very close scores and the
smallest bit of score variance matters. Along with Random Forests,
those methods are pretty hard to beat on Kaggle competitions these
days.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2012-07-05 01:21:05 UTC
Permalink
Post by Emanuele Olivetti
As some of you may have already noticed, Peter (Prettenhofer) has
just won a the "Online Product Sales" competition on kaggle.com
http://www.kaggle.com/c/online-sales/leaderboard
This is indeed great. Certainly a win for the scikit-learn community.
Thanks Peter for your investment in scikit-learn. You are really helping
it to rock, and to be useful at solving real problems.

Gael
xinfan meng
2012-07-05 02:57:09 UTC
Permalink
Congratulations to Perter. This is a great demonstration and advertisement
for scikit-learn. I really want to know how Peter develop his wining
strategy. Have he written something about this?

On Thu, Jul 5, 2012 at 9:21 AM, Gael Varoquaux <
Post by Gael Varoquaux
Post by Emanuele Olivetti
As some of you may have already noticed, Peter (Prettenhofer) has
just won a the "Online Product Sales" competition on kaggle.com
http://www.kaggle.com/c/online-sales/leaderboard
This is indeed great. Certainly a win for the scikit-learn community.
Thanks Peter for your investment in scikit-learn. You are really helping
it to rock, and to be useful at solving real problems.
Gael
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Best Wishes
--------------------------------------------
Meng Xinfan蒙新泛
Institute of Computational Linguistics
Department of Computer Science & Technology
School of Electronic Engineering & Computer Science
Peking University
Beijing, 100871
China
Gilles Louppe
2012-07-05 05:24:26 UTC
Permalink
Congratulations Peter :-)
Post by xinfan meng
Congratulations to Perter. This is a great demonstration and advertisement
for scikit-learn. I really want to know how Peter develop his wining
strategy. Have he written something about this?
On Thu, Jul 5, 2012 at 9:21 AM, Gael Varoquaux
Post by Gael Varoquaux
Post by Emanuele Olivetti
As some of you may have already noticed, Peter (Prettenhofer) has
just won a the "Online Product Sales" competition on kaggle.com
http://www.kaggle.com/c/online-sales/leaderboard
This is indeed great. Certainly a win for the scikit-learn community.
Thanks Peter for your investment in scikit-learn. You are really helping
it to rock, and to be useful at solving real problems.
Gael
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Best Wishes
--------------------------------------------
Meng Xinfan(蒙新泛)
Institute of Computational Linguistics
Department of Computer Science & Technology
School of Electronic Engineering & Computer Science
Peking University
Beijing, 100871
China
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
LI Wei
2012-07-05 05:59:11 UTC
Permalink
Congrats!

I have participated as well and do not have the time to look into the
problem before it finishes. It is rather impressive as many regular winners
participating and Peter stands out.

Best,
LI, Wei
Post by Gilles Louppe
Congratulations Peter :-)
Post by xinfan meng
Congratulations to Perter. This is a great demonstration and
advertisement
Post by xinfan meng
for scikit-learn. I really want to know how Peter develop his wining
strategy. Have he written something about this?
On Thu, Jul 5, 2012 at 9:21 AM, Gael Varoquaux
Post by Gael Varoquaux
Post by Emanuele Olivetti
As some of you may have already noticed, Peter (Prettenhofer) has
just won a the "Online Product Sales" competition on kaggle.com
http://www.kaggle.com/c/online-sales/leaderboard
This is indeed great. Certainly a win for the scikit-learn community.
Thanks Peter for your investment in scikit-learn. You are really helping
it to rock, and to be useful at solving real problems.
Gael
------------------------------------------------------------------------------
Post by xinfan meng
Post by Gael Varoquaux
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond.
Discussions
Post by xinfan meng
Post by Gael Varoquaux
will include endpoint security, mobile security and the latest in
malware
Post by xinfan meng
Post by Gael Varoquaux
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Best Wishes
--------------------------------------------
Meng Xinfan蒙新泛
Institute of Computational Linguistics
Department of Computer Science & Technology
School of Electronic Engineering & Computer Science
Peking University
Beijing, 100871
China
------------------------------------------------------------------------------
Post by xinfan meng
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Peter Prettenhofer
2012-07-05 06:54:39 UTC
Permalink
Hi everybody,

thanks a lot for your congratulations. It has been a tight race indeed
and I have to consider myself lucky that I ended up on the first place
- as Olivier already said score differences among the top teams are
really small.

Anyways, it's a great opportunity to "spread the word" - sklearn
turned out to be really handy in this competition, especially the fact
that it is so flexible - in particular, I'm referring to the grid
search and cross validation components - I haven't checked who
authored those modules but be wary: I owe you a beer!

I've to check with the competition organizers whether its ok to put
the source code on github - I'll keep you posted.

best,
Peter

PS: I owe Wes McKinney of pandas fame more than one beer...
Post by LI Wei
Congrats!
I have participated as well and do not have the time to look into the
problem before it finishes. It is rather impressive as many regular winners
participating and Peter stands out.
Best,
LI, Wei
Post by Gilles Louppe
Congratulations Peter :-)
Post by xinfan meng
Congratulations to Perter. This is a great demonstration and advertisement
for scikit-learn. I really want to know how Peter develop his wining
strategy. Have he written something about this?
On Thu, Jul 5, 2012 at 9:21 AM, Gael Varoquaux
Post by Gael Varoquaux
Post by Emanuele Olivetti
As some of you may have already noticed, Peter (Prettenhofer) has
just won a the "Online Product Sales" competition on kaggle.com
http://www.kaggle.com/c/online-sales/leaderboard
This is indeed great. Certainly a win for the scikit-learn community.
Thanks Peter for your investment in scikit-learn. You are really helping
it to rock, and to be useful at solving real problems.
Gael
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Best Wishes
--------------------------------------------
Meng Xinfan(蒙新泛)
Institute of Computational Linguistics
Department of Computer Science & Technology
School of Electronic Engineering & Computer Science
Peking University
Beijing, 100871
China
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
Olivier Grisel
2012-07-05 07:49:55 UTC
Permalink
Post by Peter Prettenhofer
Hi everybody,
thanks a lot for your congratulations. It has been a tight race indeed
and I have to consider myself lucky that I ended up on the first place
- as Olivier already said score differences among the top teams are
really small.
Anyways, it's a great opportunity to "spread the word" - sklearn
turned out to be really handy in this competition, especially the fact
that it is so flexible - in particular, I'm referring to the grid
search and cross validation components - I haven't checked who
authored those modules but be wary: I owe you a beer!
I've to check with the competition organizers whether its ok to put
the source code on github - I'll keep you posted.
If so that would be a great blog post topic. Looking forward to it.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Jaques Grobler
2012-07-05 08:42:58 UTC
Permalink
Congratulations!!! Very well done
Post by Olivier Grisel
Post by Peter Prettenhofer
Hi everybody,
thanks a lot for your congratulations. It has been a tight race indeed
and I have to consider myself lucky that I ended up on the first place
- as Olivier already said score differences among the top teams are
really small.
Anyways, it's a great opportunity to "spread the word" - sklearn
turned out to be really handy in this competition, especially the fact
that it is so flexible - in particular, I'm referring to the grid
search and cross validation components - I haven't checked who
authored those modules but be wary: I owe you a beer!
I've to check with the competition organizers whether its ok to put
the source code on github - I'll keep you posted.
If so that would be a great blog post topic. Looking forward to it.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Emanuele Olivetti
2012-07-05 13:54:52 UTC
Permalink
Post by Olivier Grisel
Post by Peter Prettenhofer
...
I've to check with the competition organizers whether its ok to put
the source code on github - I'll keep you posted.
If so that would be a great blog post topic. Looking forward to it.
Hi,

For what it's worth, I've put the code of my best submission on
github:
https://github.com/emanuele/kaggle_ops
http://www.kaggle.com/c/online-sales/forums/t/2136/the-code-of-my-best-submission

You can download and run it to get an actual file to submit to the
competition.

Of course I just ranked 21st on that competition so it is *far* less interesting than
Peter's code :-D, and I've spent only a few hours in recent weekends. It was
more a proof of concept about using blending, gradient boosting and joblib than a
serious attempt.

The resulting code is pretty short: 150 lines to process the dataset
and 80 lines to compute predictions. No real model selection :P
Anyway the code is general and you can put RF or else inside.

Best,

Emanuele
Olivier Grisel
2012-07-05 14:37:11 UTC
Permalink
Post by Emanuele Olivetti
Post by Olivier Grisel
Post by Peter Prettenhofer
...
I've to check with the competition organizers whether its ok to put
the source code on github - I'll keep you posted.
If so that would be a great blog post topic. Looking forward to it.
Hi,
For what it's worth, I've put the code of my best submission on
https://github.com/emanuele/kaggle_ops
http://www.kaggle.com/c/online-sales/forums/t/2136/the-code-of-my-best-submission
You can download and run it to get an actual file to submit to the
competition.
Of course I just ranked 21st on that competition so it is *far* less interesting than
Peter's code :-D, and I've spent only a few hours in recent weekends. It was
more a proof of concept about using blending, gradient boosting and joblib than a
serious attempt.
The resulting code is pretty short: 150 lines to process the dataset
and 80 lines to compute predictions. No real model selection :P
Anyway the code is general and you can put RF or else inside.
Thank you very much Emanuele, the blending code is very useful.

You should blog it IMHO by explaining the various code snippets:

- feature extraction / expansions (e.g. how to handle dates & times as features)
- your visual exploration of which feature to convert to the log scale
- dealing with missing values
- blending the outcome of randomized models
- cross validation and performance evaluation in general (did you do
any error analysis, e.g. bias and variance using learning curves?)

It would be great to turn it the blending procedure as either an
example for scikit-learn (using one of the default toy datasets) or a
new meta-estimator in a new package (more work required but would
improve re-usability).

The feature extraction module would also deserve some utility helpers
to deal with dates.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Peter Prettenhofer
2012-07-05 15:08:13 UTC
Permalink
Post by Olivier Grisel
Post by Emanuele Olivetti
Post by Olivier Grisel
Post by Peter Prettenhofer
...
I've to check with the competition organizers whether its ok to put
the source code on github - I'll keep you posted.
If so that would be a great blog post topic. Looking forward to it.
Hi,
For what it's worth, I've put the code of my best submission on
https://github.com/emanuele/kaggle_ops
http://www.kaggle.com/c/online-sales/forums/t/2136/the-code-of-my-best-submission
Emanuele,

thanks a lot for sharing - that's great!
Post by Olivier Grisel
Thank you very much Emanuele, the blending code is very useful.
- feature extraction / expansions (e.g. how to handle dates & times as features)
- your visual exploration of which feature to convert to the log scale
- dealing with missing values
- blending the outcome of randomized models
- cross validation and performance evaluation in general (did you do
any error analysis, e.g. bias and variance using learning curves?)
Indeed would be great to have a component to generate learning curves
in sklearn - I have some custom code lying around but it's rather
ugly...
Post by Olivier Grisel
It would be great to turn it the blending procedure as either an
example for scikit-learn (using one of the default toy datasets) or a
new meta-estimator in a new package (more work required but would
improve re-usability).
I totally agree - a stacking estimator would be great too - in my
experience getting stacking right is far more difficult than one would
expect in the first place.

best,
Peter
Olivier Grisel
2012-07-05 15:12:41 UTC
Permalink
Post by Olivier Grisel
Post by Olivier Grisel
Post by Emanuele Olivetti
Post by Olivier Grisel
Post by Peter Prettenhofer
...
I've to check with the competition organizers whether its ok to put
the source code on github - I'll keep you posted.
If so that would be a great blog post topic. Looking forward to it.
Hi,
For what it's worth, I've put the code of my best submission on
https://github.com/emanuele/kaggle_ops
http://www.kaggle.com/c/online-sales/forums/t/2136/the-code-of-my-best-submission
Emanuele,
thanks a lot for sharing - that's great!
Post by Olivier Grisel
Thank you very much Emanuele, the blending code is very useful.
- feature extraction / expansions (e.g. how to handle dates & times as features)
- your visual exploration of which feature to convert to the log scale
- dealing with missing values
- blending the outcome of randomized models
- cross validation and performance evaluation in general (did you do
any error analysis, e.g. bias and variance using learning curves?)
Indeed would be great to have a component to generate learning curves
in sklearn - I have some custom code lying around but it's rather
ugly...
I have this gist:

https://gist.github.com/2972039
Post by Olivier Grisel
Post by Olivier Grisel
It would be great to turn it the blending procedure as either an
example for scikit-learn (using one of the default toy datasets) or a
new meta-estimator in a new package (more work required but would
improve re-usability).
I totally agree - a stacking estimator would be great too - in my
experience getting stacking right is far more difficult than one would
expect in the first place.
+1
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2012-07-05 15:31:00 UTC
Permalink
Post by Peter Prettenhofer
Indeed would be great to have a component to generate learning curves
in sklearn - I have some custom code lying around but it's rather
ugly...
I must admit that so far I have been frowning away from adding code that
does any plotting to scikit-learn. I tend to be worried about the
maintenance code for such code. However, maybe having code that simply
returns the corresponding vectors would be a good compromise, as it
indeed seems a useful tool.

G
Andreas Mueller
2012-07-05 15:38:17 UTC
Permalink
Post by Gael Varoquaux
Post by Peter Prettenhofer
Indeed would be great to have a component to generate learning curves
in sklearn - I have some custom code lying around but it's rather
ugly...
I must admit that so far I have been frowning away from adding code that
does any plotting to scikit-learn. I tend to be worried about the
maintenance code for such code. However, maybe having code that simply
returns the corresponding vectors would be a good compromise, as it
indeed seems a useful tool.
+1
Btw I also want that for cross-validation results ;)
Vlad Niculae
2012-07-05 15:41:51 UTC
Permalink
Post by Gael Varoquaux
Post by Peter Prettenhofer
Indeed would be great to have a component to generate learning curves
in sklearn - I have some custom code lying around but it's rather
ugly...
I must admit that so far I have been frowning away from adding code that
does any plotting to scikit-learn. I tend to be worried about the
maintenance code for such code. However, maybe having code that simply
returns the corresponding vectors would be a good compromise, as it
indeed seems a useful tool.
+1, in my opinion that's the way it should be done, like scipy's probplot
(http://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.probplot.html)

but I'd vote for not having a plot=True option at all, for the maintenance reason above.
Post by Gael Varoquaux
G
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------
Vlad N.
http://vene.ro
Mathieu Blondel
2012-07-05 16:44:43 UTC
Permalink
On Fri, Jul 6, 2012 at 12:31 AM, Gael Varoquaux <
Post by Gael Varoquaux
I must admit that so far I have been frowning away from adding code that
does any plotting to scikit-learn. I tend to be worried about the
maintenance code for such code. However, maybe having code that simply
returns the corresponding vectors would be a good compromise, as it
indeed seems a useful tool.
This is how I did it in my calibration plot PR too:
https://github.com/scikit-learn/scikit-learn/pull/882

Mathieu
Gael Varoquaux
2012-07-05 16:51:14 UTC
Permalink
[2]https://github.com/scikit-learn/scikit-learn/pull/882
I am so far behind on PR reviewing :(. I hadn't looked at that so far. I
am only starting to catch up with mail.

G
Emanuele Olivetti
2012-07-07 17:46:24 UTC
Permalink
Post by Olivier Grisel
Post by Emanuele Olivetti
Post by Olivier Grisel
Post by Peter Prettenhofer
...
I've to check with the competition organizers whether its ok to put
the source code on github - I'll keep you posted.
If so that would be a great blog post topic. Looking forward to it.
Hi,
For what it's worth, I've put the code of my best submission on
https://github.com/emanuele/kaggle_ops
http://www.kaggle.com/c/online-sales/forums/t/2136/the-code-of-my-best-submission
You can download and run it to get an actual file to submit to the
competition.
Of course I just ranked 21st on that competition so it is *far* less interesting than
Peter's code :-D, and I've spent only a few hours in recent weekends. It was
more a proof of concept about using blending, gradient boosting and joblib than a
serious attempt.
The resulting code is pretty short: 150 lines to process the dataset
and 80 lines to compute predictions. No real model selection :P
Anyway the code is general and you can put RF or else inside.
Thank you very much Emanuele, the blending code is very useful.
- feature extraction / expansions (e.g. how to handle dates & times as features)
- your visual exploration of which feature to convert to the log scale
- dealing with missing values
- blending the outcome of randomized models
- cross validation and performance evaluation in general (did you do
any error analysis, e.g. bias and variance using learning curves?)
It would be great to turn it the blending procedure as either an
example for scikit-learn (using one of the default toy datasets) or a
new meta-estimator in a new package (more work required but would
improve re-usability).
The feature extraction module would also deserve some utility helpers
to deal with dates.
I agree with you that I should write more information about the snippets.
Time is always shorter than necessary though :-/

I'll try to do my best for the near future, but I'm not promising now ;-)
Of course if someone wants to do that I'll be happy to provide some
support :-)

To answer your question: I did just minimal model selection initially,
but mainly in an incorrect way. Doing things as Peter did requires
time I don't have at the moment (and maybe skills too!). And I did not
play much with learning curves as well. Next time maybe! :-)

For now I hope the code is enough. At least it can be run, read and
of course modified.

Best,

Emanuele
federico vaggi
2012-07-09 22:38:07 UTC
Permalink
Peter - did you get any updates from Kaggle? If not, is there anything
that we as a community can do to sway them?

On Sat, Jul 7, 2012 at 7:46 PM, Emanuele Olivetti
Post by Emanuele Olivetti
Post by Olivier Grisel
Post by Emanuele Olivetti
Post by Olivier Grisel
Post by Peter Prettenhofer
...
I've to check with the competition organizers whether its ok to put
the source code on github - I'll keep you posted.
If so that would be a great blog post topic. Looking forward to it.
Hi,
For what it's worth, I've put the code of my best submission on
https://github.com/emanuele/kaggle_ops
http://www.kaggle.com/c/online-sales/forums/t/2136/the-code-of-my-best-submission
Post by Olivier Grisel
Post by Emanuele Olivetti
You can download and run it to get an actual file to submit to the
competition.
Of course I just ranked 21st on that competition so it is *far* less
interesting than
Post by Olivier Grisel
Post by Emanuele Olivetti
Peter's code :-D, and I've spent only a few hours in recent weekends.
It was
Post by Olivier Grisel
Post by Emanuele Olivetti
more a proof of concept about using blending, gradient boosting and
joblib than a
Post by Olivier Grisel
Post by Emanuele Olivetti
serious attempt.
The resulting code is pretty short: 150 lines to process the dataset
and 80 lines to compute predictions. No real model selection :P
Anyway the code is general and you can put RF or else inside.
Thank you very much Emanuele, the blending code is very useful.
- feature extraction / expansions (e.g. how to handle dates & times as
features)
Post by Olivier Grisel
- your visual exploration of which feature to convert to the log scale
- dealing with missing values
- blending the outcome of randomized models
- cross validation and performance evaluation in general (did you do
any error analysis, e.g. bias and variance using learning curves?)
It would be great to turn it the blending procedure as either an
example for scikit-learn (using one of the default toy datasets) or a
new meta-estimator in a new package (more work required but would
improve re-usability).
The feature extraction module would also deserve some utility helpers
to deal with dates.
I agree with you that I should write more information about the snippets.
Time is always shorter than necessary though :-/
I'll try to do my best for the near future, but I'm not promising now ;-)
Of course if someone wants to do that I'll be happy to provide some
support :-)
To answer your question: I did just minimal model selection initially,
but mainly in an incorrect way. Doing things as Peter did requires
time I don't have at the moment (and maybe skills too!). And I did not
play much with learning curves as well. Next time maybe! :-)
For now I hope the code is enough. At least it can be run, read and
of course modified.
Best,
Emanuele
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Peter Prettenhofer
2012-07-10 16:02:15 UTC
Permalink
Hi Federico,

no not yet - I just approached them recently regarding this issue - I
let you know as soon as I hear from them.

best,
Peter
Peter - did you get any updates from Kaggle? If not, is there anything that
we as a community can do to sway them?
Post by Emanuele Olivetti
Post by Olivier Grisel
Post by Emanuele Olivetti
Post by Olivier Grisel
Post by Peter Prettenhofer
...
I've to check with the competition organizers whether its ok to put
the source code on github - I'll keep you posted.
If so that would be a great blog post topic. Looking forward to it.
Hi,
For what it's worth, I've put the code of my best submission on
https://github.com/emanuele/kaggle_ops
http://www.kaggle.com/c/online-sales/forums/t/2136/the-code-of-my-best-submission
You can download and run it to get an actual file to submit to the
competition.
Of course I just ranked 21st on that competition so it is *far* less
interesting than
Peter's code :-D, and I've spent only a few hours in recent weekends. It was
more a proof of concept about using blending, gradient boosting and joblib than a
serious attempt.
The resulting code is pretty short: 150 lines to process the dataset
and 80 lines to compute predictions. No real model selection :P
Anyway the code is general and you can put RF or else inside.
Thank you very much Emanuele, the blending code is very useful.
- feature extraction / expansions (e.g. how to handle dates & times as features)
- your visual exploration of which feature to convert to the log scale
- dealing with missing values
- blending the outcome of randomized models
- cross validation and performance evaluation in general (did you do
any error analysis, e.g. bias and variance using learning curves?)
It would be great to turn it the blending procedure as either an
example for scikit-learn (using one of the default toy datasets) or a
new meta-estimator in a new package (more work required but would
improve re-usability).
The feature extraction module would also deserve some utility helpers
to deal with dates.
I agree with you that I should write more information about the snippets.
Time is always shorter than necessary though :-/
I'll try to do my best for the near future, but I'm not promising now ;-)
Of course if someone wants to do that I'll be happy to provide some
support :-)
To answer your question: I did just minimal model selection initially,
but mainly in an incorrect way. Doing things as Peter did requires
time I don't have at the moment (and maybe skills too!). And I did not
play much with learning curves as well. Next time maybe! :-)
For now I hope the code is enough. At least it can be run, read and
of course modified.
Best,
Emanuele
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
Andreas Mueller
2012-07-05 08:45:24 UTC
Permalink
Hey Peter.
Pretty awesome feat! Thanks for all the work you put into the ensemble
module!

A blog post about this competition would really be great :)

I was wondering, was there much difference in performance between GBRT
and RF?

We should do a "hall of fame" on the website listing citations and real
world applications!

Cheers,
Andy
Post by Peter Prettenhofer
Hi everybody,
thanks a lot for your congratulations. It has been a tight race indeed
and I have to consider myself lucky that I ended up on the first place
- as Olivier already said score differences among the top teams are
really small.
Anyways, it's a great opportunity to "spread the word" - sklearn
turned out to be really handy in this competition, especially the fact
that it is so flexible - in particular, I'm referring to the grid
search and cross validation components - I haven't checked who
authored those modules but be wary: I owe you a beer!
I've to check with the competition organizers whether its ok to put
the source code on github - I'll keep you posted.
best,
Peter
PS: I owe Wes McKinney of pandas fame more than one beer...
Post by LI Wei
Congrats!
I have participated as well and do not have the time to look into the
problem before it finishes. It is rather impressive as many regular winners
participating and Peter stands out.
Best,
LI, Wei
Post by Gilles Louppe
Congratulations Peter :-)
Post by xinfan meng
Congratulations to Perter. This is a great demonstration and advertisement
for scikit-learn. I really want to know how Peter develop his wining
strategy. Have he written something about this?
On Thu, Jul 5, 2012 at 9:21 AM, Gael Varoquaux
Post by Gael Varoquaux
Post by Emanuele Olivetti
As some of you may have already noticed, Peter (Prettenhofer) has
just won a the "Online Product Sales" competition on kaggle.com
http://www.kaggle.com/c/online-sales/leaderboard
This is indeed great. Certainly a win for the scikit-learn community.
Thanks Peter for your investment in scikit-learn. You are really helping
it to rock, and to be useful at solving real problems.
Gael
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Best Wishes
--------------------------------------------
Meng Xinfan(蒙新泛)
Institute of Computational Linguistics
Department of Computer Science & Technology
School of Electronic Engineering & Computer Science
Peking University
Beijing, 100871
China
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
xinfan meng
2012-07-05 08:47:33 UTC
Permalink
Post by Andreas Mueller
Hey Peter.
Pretty awesome feat! Thanks for all the work you put into the ensemble
module!
A blog post about this competition would really be great :)
I was wondering, was there much difference in performance between GBRT
and RF?
+1
We should do a "hall of fame" on the website listing citations and real
world applications!
+1
Cheers,
Andy
Post by Peter Prettenhofer
Hi everybody,
thanks a lot for your congratulations. It has been a tight race indeed
and I have to consider myself lucky that I ended up on the first place
- as Olivier already said score differences among the top teams are
really small.
Anyways, it's a great opportunity to "spread the word" - sklearn
turned out to be really handy in this competition, especially the fact
that it is so flexible - in particular, I'm referring to the grid
search and cross validation components - I haven't checked who
authored those modules but be wary: I owe you a beer!
I've to check with the competition organizers whether its ok to put
the source code on github - I'll keep you posted.
best,
Peter
PS: I owe Wes McKinney of pandas fame more than one beer...
Post by LI Wei
Congrats!
I have participated as well and do not have the time to look into the
problem before it finishes. It is rather impressive as many regular
winners
Post by Peter Prettenhofer
Post by LI Wei
participating and Peter stands out.
Best,
LI, Wei
Post by Gilles Louppe
Congratulations Peter :-)
Post by xinfan meng
Congratulations to Perter. This is a great demonstration and advertisement
for scikit-learn. I really want to know how Peter develop his wining
strategy. Have he written something about this?
On Thu, Jul 5, 2012 at 9:21 AM, Gael Varoquaux
Post by Gael Varoquaux
Post by Emanuele Olivetti
As some of you may have already noticed, Peter (Prettenhofer) has
just won a the "Online Product Sales" competition on kaggle.com
http://www.kaggle.com/c/online-sales/leaderboard
This is indeed great. Certainly a win for the scikit-learn community.
Thanks Peter for your investment in scikit-learn. You are really helping
it to rock, and to be useful at solving real problems.
Gael
------------------------------------------------------------------------------
Post by Peter Prettenhofer
Post by LI Wei
Post by Gilles Louppe
Post by xinfan meng
Post by Gael Varoquaux
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Best Wishes
--------------------------------------------
Meng Xinfan蒙新泛
Institute of Computational Linguistics
Department of Computer Science & Technology
School of Electronic Engineering & Computer Science
Peking University
Beijing, 100871
China
------------------------------------------------------------------------------
Post by Peter Prettenhofer
Post by LI Wei
Post by Gilles Louppe
Post by xinfan meng
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Peter Prettenhofer
Post by LI Wei
Post by Gilles Louppe
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond.
Discussions
Post by Peter Prettenhofer
Post by LI Wei
Post by Gilles Louppe
will include endpoint security, mobile security and the latest in
malware
Post by Peter Prettenhofer
Post by LI Wei
Post by Gilles Louppe
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by Peter Prettenhofer
Post by LI Wei
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond.
Discussions
Post by Peter Prettenhofer
Post by LI Wei
will include endpoint security, mobile security and the latest in
malware
Post by Peter Prettenhofer
Post by LI Wei
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Best Wishes
--------------------------------------------
Meng Xinfan蒙新泛
Institute of Computational Linguistics
Department of Computer Science & Technology
School of Electronic Engineering & Computer Science
Peking University
Beijing, 100871
China
Emanuele Olivetti
2012-07-05 13:42:55 UTC
Permalink
Post by Andreas Mueller
Hey Peter.
Pretty awesome feat! Thanks for all the work you put into the ensemble
module!
A blog post about this competition would really be great :)
I was wondering, was there much difference in performance between GBRT
and RF?
Hi,

I participated to the competition (with not so good results ;-)) but
there was indeed a significant difference between RF and GB. I tried RF, ExtraTrees
and GradientBoosting and got a RMSLE of ~0.60 using the first two (with
blending and a preprocessing procedure similar to the one of Peter) and
~0.578 with GB. OK, you might say that the difference is little but there
are tens of positions in the final rank between them ;-)

Best,

Emanuele
Olivier Grisel
2012-07-05 14:30:38 UTC
Permalink
Post by Emanuele Olivetti
Post by Andreas Mueller
Hey Peter.
Pretty awesome feat! Thanks for all the work you put into the ensemble
module!
A blog post about this competition would really be great :)
I was wondering, was there much difference in performance between GBRT
and RF?
Hi,
I participated to the competition (with not so good results ;-)) but
there was indeed a significant difference between RF and GB. I tried RF, ExtraTrees
and GradientBoosting and got a RMSLE of ~0.60 using the first two (with
blending and a preprocessing procedure similar to the one of Peter) and
~0.578 with GB. OK, you might say that the difference is little but there
are tens of positions in the final rank between them ;-)
This is amazing that the intrinsic variance of those is estimator is
so small as to be able to say that a GBRT with mean loss around 0.578
beats RF with mean loss around 0.60 in a significant manner.

What is the order of magnitude std error of 10 folds cross validation
on this problem? 0.1? less?

Is the ordering of the validation set and the final test set changed
much? Did you observe wide differences between you internal CV score
and the final test scores ?
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Peter Prettenhofer
2012-07-05 14:57:25 UTC
Permalink
Post by Olivier Grisel
Post by Emanuele Olivetti
Post by Andreas Mueller
Hey Peter.
Pretty awesome feat! Thanks for all the work you put into the ensemble
module!
A blog post about this competition would really be great :)
I was wondering, was there much difference in performance between GBRT
and RF?
Hi,
I participated to the competition (with not so good results ;-)) but
there was indeed a significant difference between RF and GB. I tried RF, ExtraTrees
and GradientBoosting and got a RMSLE of ~0.60 using the first two (with
blending and a preprocessing procedure similar to the one of Peter) and
~0.578 with GB. OK, you might say that the difference is little but there
are tens of positions in the final rank between them ;-)
This is amazing that the intrinsic variance of those is estimator is
so small as to be able to say that a GBRT with mean loss around 0.578
beats RF with mean loss around 0.60 in a significant manner.
What do you mean by intrinsic variance exactly? Do you mean how
performance varies w.r.t. to difference in the training set? If so,
the question is whether they differ consistently.
Post by Olivier Grisel
What is the order of magnitude std error of 10 folds cross validation
on this problem? 0.1? less?
Is the ordering of the validation set and the final test set changed
much? Did you observe wide differences between you internal CV score
and the final test scores ?
Given the small differences on the leaderboard, final rankings were
rather stable - compared to the recently finished "Biological
Response" challenge where the second rank on the leaderboard ended up
on final rank 29.

Some contestants reported wide differences between internal CV scores
and leaderboard scores; My internal CV scores tracked the leaderboard
scores quite well (see [1]);
Model selection is my nemesis - little can be gained, everything lost :-)

In the end I did 5x 5-fold CV - the error std between the repetitions
was around 0.005. I need to rerun my best model to get the error std
within a single repetition.

[1] http://www.kaggle.com/c/online-sales/forums/t/1958/cross-validation-errors
Post by Olivier Grisel
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
Olivier Grisel
2012-07-05 15:17:40 UTC
Permalink
Post by Peter Prettenhofer
Post by Olivier Grisel
Post by Emanuele Olivetti
Post by Andreas Mueller
Hey Peter.
Pretty awesome feat! Thanks for all the work you put into the ensemble
module!
A blog post about this competition would really be great :)
I was wondering, was there much difference in performance between GBRT
and RF?
Hi,
I participated to the competition (with not so good results ;-)) but
there was indeed a significant difference between RF and GB. I tried RF, ExtraTrees
and GradientBoosting and got a RMSLE of ~0.60 using the first two (with
blending and a preprocessing procedure similar to the one of Peter) and
~0.578 with GB. OK, you might say that the difference is little but there
are tens of positions in the final rank between them ;-)
This is amazing that the intrinsic variance of those is estimator is
so small as to be able to say that a GBRT with mean loss around 0.578
beats RF with mean loss around 0.60 in a significant manner.
What do you mean by intrinsic variance exactly? Do you mean how
performance varies w.r.t. to difference in the training set? If so,
the question is whether they differ consistently.
Post by Olivier Grisel
What is the order of magnitude std error of 10 folds cross validation
on this problem? 0.1? less?
Is the ordering of the validation set and the final test set changed
much? Did you observe wide differences between you internal CV score
and the final test scores ?
Given the small differences on the leaderboard, final rankings were
rather stable - compared to the recently finished "Biological
Response" challenge where the second rank on the leaderboard ended up
on final rank 29.
Some contestants reported wide differences between internal CV scores
and leaderboard scores; My internal CV scores tracked the leaderboard
scores quite well (see [1]);
Model selection is my nemesis - little can be gained, everything lost :-)
In the end I did 5x 5-fold CV - the error std between the repetitions
was around 0.005. I need to rerun my best model to get the error std
within a single repetition.
Alright thanks the 5 times 5-fold CV scheme and the 0.005 stderr is the
kind of details I was looking for.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2012-07-05 15:33:41 UTC
Permalink
Post by Peter Prettenhofer
Model selection is my nemesis - little can be gained, everything lost :-)
Agreed!
Post by Peter Prettenhofer
In the end I did 5x 5-fold CV - the error std between the repetitions
was around 0.005.
I am a big fan of using ShuffleSplit to reduce the variance by using many
folds while keeping a 10 to 20% ratio between train and test. On our data
it seems to be a good option, to the cost of computational time, of
course.

G
federico vaggi
2012-07-05 09:48:12 UTC
Permalink
If you get an OK - that would be absolutely amazing, especially if you
broke it down and explained the different tweaks.

Speaking for myself - I am only super interested in seeing how you set up
the grid search on the EC3 instances :)

Congratulations on a fantastic job, and this will draw a lot more attention
to sklearn!

Federico

On Thu, Jul 5, 2012 at 8:54 AM, Peter Prettenhofer <
Post by Peter Prettenhofer
Hi everybody,
thanks a lot for your congratulations. It has been a tight race indeed
and I have to consider myself lucky that I ended up on the first place
- as Olivier already said score differences among the top teams are
really small.
Anyways, it's a great opportunity to "spread the word" - sklearn
turned out to be really handy in this competition, especially the fact
that it is so flexible - in particular, I'm referring to the grid
search and cross validation components - I haven't checked who
authored those modules but be wary: I owe you a beer!
I've to check with the competition organizers whether its ok to put
the source code on github - I'll keep you posted.
best,
Peter
PS: I owe Wes McKinney of pandas fame more than one beer...
Post by LI Wei
Congrats!
I have participated as well and do not have the time to look into the
problem before it finishes. It is rather impressive as many regular
winners
Post by LI Wei
participating and Peter stands out.
Best,
LI, Wei
Post by Gilles Louppe
Congratulations Peter :-)
Post by xinfan meng
Congratulations to Perter. This is a great demonstration and advertisement
for scikit-learn. I really want to know how Peter develop his wining
strategy. Have he written something about this?
On Thu, Jul 5, 2012 at 9:21 AM, Gael Varoquaux
Post by Gael Varoquaux
Post by Emanuele Olivetti
As some of you may have already noticed, Peter (Prettenhofer) has
just won a the "Online Product Sales" competition on kaggle.com
http://www.kaggle.com/c/online-sales/leaderboard
This is indeed great. Certainly a win for the scikit-learn community.
Thanks Peter for your investment in scikit-learn. You are really helping
it to rock, and to be useful at solving real problems.
Gael
------------------------------------------------------------------------------
Post by LI Wei
Post by Gilles Louppe
Post by xinfan meng
Post by Gael Varoquaux
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Best Wishes
--------------------------------------------
Meng Xinfan蒙新泛
Institute of Computational Linguistics
Department of Computer Science & Technology
School of Electronic Engineering & Computer Science
Peking University
Beijing, 100871
China
------------------------------------------------------------------------------
Post by LI Wei
Post by Gilles Louppe
Post by xinfan meng
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by LI Wei
Post by Gilles Louppe
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond.
Discussions
Post by LI Wei
Post by Gilles Louppe
will include endpoint security, mobile security and the latest in
malware
Post by LI Wei
Post by Gilles Louppe
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Post by LI Wei
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Satrajit Ghosh
2012-07-05 02:47:52 UTC
Permalink
Post by Emanuele Olivetti
As some of you may have already noticed, Peter (Prettenhofer) has
just won a the "Online Product Sales" competition on kaggle.com
congratulations peter. this is wonderful news.
Vlad Niculae
2012-07-05 12:28:43 UTC
Permalink
Congratulations Peter! Excellent work as always!

Vlad
Post by Emanuele Olivetti
Dear All,
As some of you may have already noticed, Peter (Prettenhofer) has
just won a the "Online Product Sales" competition on kaggle.com
http://www.kaggle.com/c/online-sales/leaderboard
The competition was about predicting the monthly online sales of
a product. In my opinion it was a remarkably difficult competition,
so... congratulations!
As far as I understand Peter used scikit-learn and specifically (his)
GradientBoostingRegressor(), in a clever way.
This is an excellent result for him and - surely - a nice one for
scikit-learn.
Emanuele
------------------------------------------------------------------------------
Live Security Virtual Conference
Exclusive live event will cover all the ways today's security and
threat landscape has changed and how IT managers can respond. Discussions
will include endpoint security, mobile security and the latest in malware
threats. http://www.accelacomm.com/jaw/sfrnl04242012/114/50122263/
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------
Vlad N.
http://vene.ro
Continue reading on narkive:
Loading...