[Scikit-learn-general] Random Forest Custom Label

Jacob Schreiber

2016-03-02 00:39:40 UTC

Question 1: It does not do an internal cross-validation to prevent
overfitting.
Question 2: Yes, you can put a higher weight on your positive class. Look
at the class_weights parameter in the documentation here:
http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestClassifier.html

Post by Mamun Rashid
Hi All,
Random Forest algorithm creates number of trees using randomly selected
subset of samples and features. At each node of the tree it uses the Gini
information gain
to find the best feature-threshold (various threshold is tested for each
feature) pair to obtain the best separation between the positive and the
negative class.
I have a two class classification problem where the positive labels reside
in clusters. A traditional cross validation approach is not aware of this
issue and splits data
points from a cluster in to training and test set giving rise to strong
classification performance. I wrote a custom cross validation loop to
address this issue. However
the bootstrapping method inside the Random Forest algorithm
randomly selects samples and features and controls for overfitting.
When it applies the fit method on randomly selected samples, does it do
an internal cross validation to prevent overfitting ? I did not find this
in the github code.
If yes, Can I specify my groupings to Random Forest ?
Gini impurity at each node tries to find the best separation between two
classes. I care more about obtaining a cleaner separation for my positive
class. Is there
any way to give importance to one class during the partitioning.
Thanks in advance.
Mamun
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general