Discussion:
[Scikit-learn-general] Parameter estimation by Customised Cross Validation
Mamun Rashid
2016-02-05 15:05:45 UTC
Permalink
Hi Folks,
I have a two class classification problem where the positive labels reside in clusters.
A traditional cross validation approach is not aware of this issue and splits data points from a cluster in to training and test set giving rise to strong classification performance.
I have written a custom cross validation routine where I hold data points from each cluster either in training or in test set (never allowing them to split). Finally I retrain the a
Random forest classifier using all the positive set.

My question is :
- Can I somehow tune the parameters for a RFC for train the final classifier using these tuned parameters.

I do understand that GridSearchCV or Randomised parameter optimisation allows to do this but it follows a traditional CV and splits the clusters I mentioned earlier.


Thanks in advance.

Mamun
Vlad Niculae
2016-02-05 16:42:08 UTC
Permalink
Hi Mamun,

If your cluster labels are known, you can use the LabelShuffleSplit
ore LeavePLabelOut cross-validation generators.

HTH,
Vlad
Post by Mamun Rashid
Hi Folks,
I have a two class classification problem where the positive labels reside in clusters.
A traditional cross validation approach is not aware of this issue and splits data points from a cluster in to training and test set giving rise to strong classification performance.
I have written a custom cross validation routine where I hold data points from each cluster either in training or in test set (never allowing them to split). Finally I retrain the a
Random forest classifier using all the positive set.
- Can I somehow tune the parameters for a RFC for train the final classifier using these tuned parameters.
I do understand that GridSearchCV or Randomised parameter optimisation allows to do this but it follows a traditional CV and splits the clusters I mentioned earlier.
Thanks in advance.
Mamun
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Jamie Bull
2016-02-06 07:18:01 UTC
Permalink
I've been working on a GridSearchTransductive which might do what you need
- or be adjustable to do it. You can take a look at #6160 Refactor
model_selection._search to include transductive estimators.

Jamie
Post by Vlad Niculae
Hi Mamun,
If your cluster labels are known, you can use the LabelShuffleSplit
ore LeavePLabelOut cross-validation generators.
HTH,
Vlad
Post by Mamun Rashid
Hi Folks,
I have a two class classification problem where the positive labels
reside in clusters.
Post by Mamun Rashid
A traditional cross validation approach is not aware of this issue and
splits data points from a cluster in to training and test set giving rise
to strong classification performance.
Post by Mamun Rashid
I have written a custom cross validation routine where I hold data
points from each cluster either in training or in test set (never allowing
them to split). Finally I retrain the a
Post by Mamun Rashid
Random forest classifier using all the positive set.
- Can I somehow tune the parameters for a RFC for train the final
classifier using these tuned parameters.
Post by Mamun Rashid
I do understand that GridSearchCV or Randomised parameter optimisation
allows to do this but it follows a traditional CV and splits the clusters I
mentioned earlier.
Post by Mamun Rashid
Thanks in advance.
Mamun
------------------------------------------------------------------------------
Post by Mamun Rashid
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Loading...