Discussion:
[Scikit-learn-general] Decision tree pruning
Charanpal Dhanjal
2012-03-13 10:20:45 UTC
Permalink
I noticed that decision trees are currently unpruned, and wondered if
anyone was working on this (or has been)? If not, I might implement
pruning myself.
Charanpal
Andreas
2012-03-13 10:18:52 UTC
Permalink
Hi Charanpal.
In the recent GSoC-Thread, Vikram Kamath has proposed this and other
improvements
as a GSoC project. This idea is still in a pretty early stage, though.

In general, this is definitely a useful feature.

Cheers,
Andy
Charanpal Dhanjal
2012-03-13 11:00:07 UTC
Permalink
Hi Andy,
Thanks for the information. I read the thread by Vikram, and would
gladly share my work with him. My particular interest is in model
selection for decision trees and at this stage I would like to test how
different prunings can improve generalisation.
Best,
Charanpal
Post by Andreas
Hi Charanpal.
In the recent GSoC-Thread, Vikram Kamath has proposed this and other
improvements
as a GSoC project. This idea is still in a pretty early stage, though.
In general, this is definitely a useful feature.
Cheers,
Andy
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Brian Holt
2012-03-13 10:25:38 UTC
Permalink
Decision trees tend to overfit, so they are most often used (unpruned) in a forest. That said, I think it would be a useful contribution to our offering.

Brian

-----Original Message-----
From: Charanpal Dhanjal <***@gmail.com>
Date: Tue, 13 Mar 2012 11:20:45
To: <scikit-learn-***@lists.sourceforge.net>
Reply-To: scikit-learn-***@lists.sourceforge.net
Subject: [Scikit-learn-general] Decision tree pruning

I noticed that decision trees are currently unpruned, and wondered if
anyone was working on this (or has been)? If not, I might implement
pruning myself.
Charanpal

------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-***@lists.sourceforge.net
https://lists.so
Paolo Losi
2012-03-13 11:11:53 UTC
Permalink
Since ensemble methods consistently outperform "traditional" tree building
(where variance is controlled by pruning), what are the advantages of
implementing
pruning in sklearn?

Paolo

N.B. The question is not directed specifically to Brain
but to GoS applicants and sklearn contributors.
Post by Brian Holt
Decision trees tend to overfit, so they are most often used (unpruned) in a forest. That said, I think it would be a useful contribution to our offering.
Brian
Andreas
2012-03-13 11:09:25 UTC
Permalink
Post by Paolo Losi
Since ensemble methods consistently outperform "traditional" tree building
(where variance is controlled by pruning), what are the advantages of
implementing
pruning in sklearn?
I think the idea would be to have an easy to interpret model.
There might be applications where this is beneficial.

Also, I think having well-known models in sklearn is a good
idea, even if they are not top-performing.

Cheers,
Andy
Peter Prettenhofer
2012-03-13 11:25:45 UTC
Permalink
Post by Paolo Losi
Post by Paolo Losi
Since ensemble methods consistently outperform "traditional" tree
building
Post by Paolo Losi
(where variance is controlled by pruning), what are the advantages of
implementing
pruning in sklearn?
I think the idea would be to have an easy to interpret model.
There might be applications where this is beneficial.
I agree, interpretability is the first thing that comes to my mind.

best,
Peter
Post by Paolo Losi
Also, I think having well-known models in sklearn is a good
idea, even if they are not top-performing.
I totally agree.
Post by Paolo Losi
Cheers,
Andy
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
Paolo Losi
2012-03-13 11:34:43 UTC
Permalink
Post by Andreas
Post by Paolo Losi
Since ensemble methods consistently outperform "traditional" tree building
(where variance is controlled by pruning), what are the advantages of
implementing
pruning in sklearn?
I think the idea would be to have an easy to interpret model.
There might be applications where this is beneficial.
Also, I think having well-known models in sklearn is a good
idea, even if they are not top-performing.
That's fair enough. Thanks

Paolo
Frédéric Bastien
2012-03-13 13:10:46 UTC
Permalink
I would also add that probably ensemble are slower to train then
prunned tree. In academic, this is not a too big problem, but in
industries it can be important in some case.

Fred
Post by Paolo Losi
Post by Andreas
Post by Paolo Losi
Since ensemble methods consistently outperform "traditional" tree building
(where variance is controlled by pruning), what are the advantages of
implementing
pruning in sklearn?
I think the idea would be to have an easy to interpret model.
There might be applications where this is beneficial.
Also, I think having well-known models in sklearn is a good
idea, even if they are not top-performing.
That's fair enough. Thanks
Paolo
------------------------------------------------------------------------------
Keep Your Developer Skills Current with LearnDevNow!
The most comprehensive online learning library for Microsoft developers
is just $99.99! Visual Studio, SharePoint, SQL - plus HTML5, CSS3, MVC3,
Metro Style Apps, more. Free future releases when you subscribe now!
http://p.sf.net/sfu/learndevnow-d2d
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Mathieu Blondel
2012-03-13 14:42:52 UTC
Permalink
Post by Frédéric Bastien
I would also add that probably ensemble are slower to train then
prunned tree. In academic, this is not a too big problem, but in
industries it can be important in some case.
And slower to make predictions too, I guess.

Mathieu

Continue reading on narkive:
Loading...