Mamun Rashid
2016-02-05 15:05:45 UTC
Hi Folks,
I have a two class classification problem where the positive labels reside in clusters.
A traditional cross validation approach is not aware of this issue and splits data points from a cluster in to training and test set giving rise to strong classification performance.
I have written a custom cross validation routine where I hold data points from each cluster either in training or in test set (never allowing them to split). Finally I retrain the a
Random forest classifier using all the positive set.
My question is :
- Can I somehow tune the parameters for a RFC for train the final classifier using these tuned parameters.
I do understand that GridSearchCV or Randomised parameter optimisation allows to do this but it follows a traditional CV and splits the clusters I mentioned earlier.
Thanks in advance.
Mamun
I have a two class classification problem where the positive labels reside in clusters.
A traditional cross validation approach is not aware of this issue and splits data points from a cluster in to training and test set giving rise to strong classification performance.
I have written a custom cross validation routine where I hold data points from each cluster either in training or in test set (never allowing them to split). Finally I retrain the a
Random forest classifier using all the positive set.
My question is :
- Can I somehow tune the parameters for a RFC for train the final classifier using these tuned parameters.
I do understand that GridSearchCV or Randomised parameter optimisation allows to do this but it follows a traditional CV and splits the clusters I mentioned earlier.
Thanks in advance.
Mamun