Discussion:
[Scikit-learn-general] "In-bag" for RandomForest*
Ariel Rokem
2016-03-05 00:04:33 UTC
Permalink
Hi everyone,

Is there some way to identify the samples that were used in constructing
each tree in a RandomForest* object?

I am looking for the equivalent of "keep.inbag" in this R implementation:
http://math.furman.edu/~dcs/courses/math47/R/library/randomForest/html/randomForest.html

Thanks!

Ariel
Andreas Mueller
2016-03-07 16:24:13 UTC
Permalink
Hi Ariel.
We are not storing them any more because of memory issues, but you can
recover them using the random state of the tree:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/forest.py#L76
indices = _generate_sample_indices(tree.random_state, n_samples)
Hth,
Andy
Hi everyone,
Is there some way to identify the samples that were used in
constructing each tree in a RandomForest* object?
I am looking for the equivalent of "keep.inbag" in this R
http://math.furman.edu/~dcs/courses/math47/R/library/randomForest/html/randomForest.html
<http://math.furman.edu/%7Edcs/courses/math47/R/library/randomForest/html/randomForest.html>
Thanks!
Ariel
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Ariel Rokem
2016-03-08 16:29:38 UTC
Permalink
Post by Andreas Mueller
Hi Ariel.
We are not storing them any more because of memory issues, but you can
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/forest.py#L76
indices = _generate_sample_indices(tree.random_state, n_samples)
Yes - very helpful - thanks! I have recorded our full solution for
posterity (and for google-ability) here:
http://stackoverflow.com/questions/35832786/in-bag-for-randomforest-objects/35872711
Post by Andreas Mueller
Hth,
Andy
Hi everyone,
Is there some way to identify the samples that were used in constructing
each tree in a RandomForest* object?
http://math.furman.edu/~dcs/courses/math47/R/library/randomForest/html/randomForest.html
Thanks!
Ariel
------------------------------------------------------------------------------
_______________________________________________
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Mathieu Blondel
2016-03-09 02:45:48 UTC
Permalink
If this function is generally useful, it might be a good idea to make it
public.

Mathieu
Post by Ariel Rokem
Post by Andreas Mueller
Hi Ariel.
We are not storing them any more because of memory issues, but you can
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/forest.py#L76
indices = _generate_sample_indices(tree.random_state, n_samples)
Yes - very helpful - thanks! I have recorded our full solution for
http://stackoverflow.com/questions/35832786/in-bag-for-randomforest-objects/35872711
Post by Andreas Mueller
Hth,
Andy
Hi everyone,
Is there some way to identify the samples that were used in constructing
each tree in a RandomForest* object?
http://math.furman.edu/~dcs/courses/math47/R/library/randomForest/html/randomForest.html
Thanks!
Ariel
------------------------------------------------------------------------------
_______________________________________________
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Ariel Rokem
2016-03-09 19:03:11 UTC
Permalink
Hi Mathieu,
Post by Mathieu Blondel
If this function is generally useful, it might be a good idea to make it
public.
Agreed!

We're incorporating this in work we are doing to implement a method to
calculate confidence intervals for RandomForest predictions (based on this
previous work in R: https://github.com/swager/randomForestCI). We've
started some preliminary work in a github repo of our own (
https://github.com/arokem/erlking), but we would be happy to contribute any
of this into the sklearn eco-system, if that's a good thing.

Cheers,

Ariel
Post by Mathieu Blondel
Mathieu
Post by Ariel Rokem
Post by Andreas Mueller
Hi Ariel.
We are not storing them any more because of memory issues, but you can
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/ensemble/forest.py#L76
indices = _generate_sample_indices(tree.random_state, n_samples)
Yes - very helpful - thanks! I have recorded our full solution for
http://stackoverflow.com/questions/35832786/in-bag-for-randomforest-objects/35872711
Post by Andreas Mueller
Hth,
Andy
Hi everyone,
Is there some way to identify the samples that were used in constructing
each tree in a RandomForest* object?
I am looking for the equivalent of "keep.inbag" in this R
http://math.furman.edu/~dcs/courses/math47/R/library/randomForest/html/randomForest.html
Thanks!
Ariel
------------------------------------------------------------------------------
_______________________________________________
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://makebettercode.com/inteldaal-eval
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Loading...