[Scikit-learn-general] Parameter estimation by Customised Cross Validation

Discussion:

Mamun Rashid

2016-02-05 15:05:45 UTC

Hi Folks,
I have a two class classification problem where the positive labels reside in clusters.
A traditional cross validation approach is not aware of this issue and splits data points from a cluster in to training and test set giving rise to strong classification performance.
I have written a custom cross validation routine where I hold data points from each cluster either in training or in test set (never allowing them to split). Finally I retrain the a
Random forest classifier using all the positive set.

My question is :
- Can I somehow tune the parameters for a RFC for train the final classifier using these tuned parameters.

I do understand that GridSearchCV or Randomised parameter optimisation allows to do this but it follows a traditional CV and splits the clusters I mentioned earlier.

Thanks in advance.

Mamun

Vlad Niculae

2016-02-05 16:42:08 UTC

Permalink

Hi Mamun,

If your cluster labels are known, you can use the LabelShuffleSplit
ore LeavePLabelOut cross-validation generators.

HTH,
Vlad

Post by Mamun Rashid
Hi Folks,
I have a two class classification problem where the positive labels reside in clusters.
A traditional cross validation approach is not aware of this issue and splits data points from a cluster in to training and test set giving rise to strong classification performance.
I have written a custom cross validation routine where I hold data points from each cluster either in training or in test set (never allowing them to split). Finally I retrain the a
Random forest classifier using all the positive set.
- Can I somehow tune the parameters for a RFC for train the final classifier using these tuned parameters.
I do understand that GridSearchCV or Randomised parameter optimisation allows to do this but it follows a traditional CV and splits the clusters I mentioned earlier.
Thanks in advance.
Mamun
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

Jamie Bull

2016-02-06 07:18:01 UTC

Permalink

I've been working on a GridSearchTransductive which might do what you need
- or be adjustable to do it. You can take a look at #6160 Refactor
model_selection._search to include transductive estimators.

Jamie

Post by Vlad Niculae
Hi Mamun,
If your cluster labels are known, you can use the LabelShuffleSplit
ore LeavePLabelOut cross-validation generators.
HTH,
Vlad

Post by Mamun Rashid
Hi Folks,
I have a two class classification problem where the positive labels

reside in clusters.

Post by Mamun Rashid
A traditional cross validation approach is not aware of this issue and

splits data points from a cluster in to training and test set giving rise
to strong classification performance.

Post by Mamun Rashid
I have written a custom cross validation routine where I hold data

points from each cluster either in training or in test set (never allowing
them to split). Finally I retrain the a

Post by Mamun Rashid
Random forest classifier using all the positive set.
- Can I somehow tune the parameters for a RFC for train the final

classifier using these tuned parameters.

Post by Mamun Rashid
I do understand that GridSearchCV or Randomised parameter optimisation

allows to do this but it follows a traditional CV and splits the clusters I
mentioned earlier.

Post by Mamun Rashid
Thanks in advance.
Mamun

------------------------------------------------------------------------------

Post by Mamun Rashid
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general

------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general