--_f675bcec-9ea1-45b8-b27c-f89eff55463f_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
Found the error...I post below. The problem is that metrics.confusion_matrix accept lists and not numpy.array. So I converted everything in list:
#Compute the confusion matrix y_testlist_tmp = y_test.transpose().tolist() y_testlist = y_testlist_tmp[0] resultlist = result.tolist() cfmat = metrics.confusion_matrix(y_testlist,resultlist)
Thanks All!! If you have any suggestion I'm happy to listen!
From: ***@msn.com
To: scikit-learn-***@lists.sourceforge.net
Date: Fri, 21 Jun 2013 18:23:49 +0200
Subject: Re: [Scikit-learn-general] SVM: select the training set randomly
Ah ok! Yeah, I was thinking that having in my dataset 50/50 (also 40/60) of my dataset for the two classes will be not a problem but since that the ratio is 1/3 I would prefere to have the same distribution for the two, then my choose to use the train_test_split method. I don't know if there are something better but this seems to work :) !!
Now I was trying to understand how to get the confusion matrix but I came out with a problem. Below the code (that comes from the code posted in the previous message) and his relative error:
... #Define a SVM using the best parameters C and gamma clf = svm.SVC(gamma = clfopt.best_estimator_.gamma, C = clfopt.best_estimator_.C) clf.fit(X_train, y_train)
result = clf.predict(X_test) metrics.confusion_matrix(y_test,result) ...
Traceback (most recent call last): File "<pyshell#101>", line 1, in <module> metrics.confusion_matrix(y_test,result) File "C:\Python27_32\lib\site-packages\sklearn\metrics\metrics.py", line 610, in confusion_matrix y_true = np.array([label_to_ind.get(x, n_labels + 1) for x in y_true])TypeError: unhashable type: 'numpy.ndarray'
Thanks for your Precious Support!
From: ***@gmail.com
Date: Fri, 21 Jun 2013 12:11:00 -0400
To: scikit-learn-***@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] SVM: select the training set randomly
Oh sorry, I was thinking of balanced sets for cross validation, rather than a training and testing split. I don't know of a convenience routine specifically for producing stratified training and testing sets. If both your classes have decent support and the training and testing set sizes aren't too small then you should end up with pretty representative samples anyway. You could check the class balance to make sure they're not too far off. Arguably a slightly different class balance is reasonable anyway if you are trying to check out-of-sample performance.
-Roban
On Fri, Jun 21, 2013 at 11:47 AM, Gianni Iannelli <***@msn.com> wrote:
StratifiedKFold will keep the class distribution the same for you:
http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html#sklearn.cross_validation.StratifiedKFold
I was looking at this, it is written:
This cross-validation object is a variation of KFold, which returns stratified folds. The folds are made by preserving the percentage of samples for each class.
But I don't know how he could manage since that I pass to him just the training set and I don't also how to set this percentage for each class. Do I miss something?
I have done one simple test code (see below) where I have my two dataset (class A and class B). I add a cicle for where I select the 20% for each as test and the other 80% as training. I concatenate the train and the test. I scale my training and my test. I found the best C and gamma for my RBF SVM. Train the SVM and use on my test set. The results are in a list with the score values. I think that I'm doing something wrong cause I get my score always 0.5 (in this case I tried always with range(3)).
I think that I will take a look on the metrics that you wrote to me! thanks for that!! Do you think that the StratifiedKFold is better than train_test_split ? Could you see some conceptual mistake on the code below?
#TEST
X_noscaled_A = X_noscaled_A[0:100,:]
y_A = y_A[0:100,:]X_noscaled_B = X_noscaled_B[0:100,:]
y_B = y_B[0:100,:]
#Define a list for the resultsscores = list()
for i in range(3):
#Split keeping the ratio X_train_noscal_A, X_test_noscal_A, y_train_A, y_test_A = train_test_split(X_noscaled_A, y_A, test_size = 0.20)
X_train_noscal_B, X_test_noscal_B, y_train_B, y_test_B = train_test_split(X_noscaled_B, y_B, test_size = 0.20)
#Concatenate in order to have just one vector for train and one vector for test
X_train_noscal = numpy.concatenate((X_train_noscal_A, X_train_noscal_B))
y_train = numpy.concatenate((y_train_A,y_train_B)) X_test_noscal = numpy.concatenate((X_test_noscal_A,X_test_noscal_B))
y_test = numpy.concatenate((y_test_A,y_test_B))
#Scale the training set
scaler = preprocessing.StandardScaler().fit(X_train_noscal) X_train = scaler.transform(X_train_noscal)
#Scale the test set using the values obtained from the test set
X_test = scaler.transform(X_test_noscal)
#Optimization of C and gamma
C_range = 10.0 ** numpy.arange(-3, 7) gamma_range = 10.0 ** numpy.arange(-5, 3)
param_grid = dict(gamma=gamma_range, C=C_range) svr = svm.SVC()
clfopt = grid_search.GridSearchCV(svr,param_grid) clfopt.fit(X_train, y_train)
print clfopt.best_estimator_.C
print clfopt.best_estimator_.gamma
#Define a SVM using the best parameters C and gamma clf = svm.SVC(gamma = clfopt.best_estimator_.gamma, C = clfopt.best_estimator_.C)
clf.fit(X_train, y_train)
#Write the result in the list
scores.append(clf.score(X_test,y_test))
#See the resultsprint scores
From: ***@gmail.com
Date: Fri, 21 Jun 2013 10:57:58 -0400
To: scikit-learn-***@lists.sourceforge.net
Subject: Re: [Scikit-learn-general] SVM: select the training set randomly
StratifiedKFold will keep the class distribution the same for you:
http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html#sklearn.cross_validation.StratifiedKFold
There are lots of metrics (score functions, etc.) available:
http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics
http://scikit-learn.org/stable/modules/model_evaluation.html#model-evaluation
See the docs for a particular estimator to find out what the score method returns (which is generally the score function used in optimizing the model). For instance
http://jaquesgrobler.github.io/Online-Scikit-Learn-stat-tut/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC.score
-Roban
On Fri, Jun 21, 2013 at 10:20 AM, Gianni Iannelli <***@msn.com> wrote:
>
> Thank You very much for the link!! It does closely what I wanna do!
>
> In my case I have two classes that are for example 0 and 1. I wanna keep the distribution (in the training set and so also the test set) between them similar. And I also need that are choosen randomly, I don't care if in one case I get the same index for training and test set. Well, to select randomly I think that the sklearn.cross_validation.ShuffleSplit() does what I want and I will investigate on that. To keep the distribution equally between the two classes I was thinking to:
>
> split the two classes
> apply for each of them the separation in training and test using the ShuffleSplit()
> concatenate now the two classes again (they will have the same size before the split)
> add to one of the two index vector the size of one of the two class (depends how I will concatenate the two)
> apply my SVM classification
>
>
> What do you think? Do you think is ok?
>
> I have one another question. How score works? What it computes? I searched around but I found this:
>
> sklearn.metrics.classification_report(y_true, y_pred, labels=None, target_names=None)
>
> That maybe could give me back a confusion matrix where I could compute (maybe, I'm just guessing) an Overall Accuracy and a Kappa Coefficient.
>
> Is it correct?
>
> Thank You Very Much!!!
>
> ________________________________
> Date: Fri, 21 Jun 2013 10:59:13 +1000
> From: ***@student.usyd.edu.au
> To: scikit-learn-***@lists.sourceforge.net
> Subject: Re: [Scikit-learn-general] SVM: select the training set randomly
>
>
> Please see http://scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html
>
>
> On Fri, Jun 21, 2013 at 10:31 AM, Gianni Iannelli <***@msn.com> wrote:
>
> Dear All,
>
> I have one question. I have a dataset of 100 vector each with some features. Of this 100 I already know the classification of all of them. What I wanna do is select randomly in this 100 a subset to use as training set and the rest as test set. There is something already implemented in scikit that do it automatically or I have to use an index method? For index method I mean to separate the two classes, for example I have 40 (class A) and 60 (class B) for each of them I select 10 number randomly for each class and put set these 20 vector as training set. After that I select the other 80 vectors (also using the index of the main matrix) and classify.
>
> Do you think this is too crazy and there is something simple? There is also a validation of the result that could tell me how the classification is good? I know that this is not a real case because I know the classification result but I just wanna see what happens changing the number of features, number of training elements, and so on.
>
> Thanks All!!!
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
>
> ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________ Scikit-learn-general mailing list Scikit-learn-***@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
> ------------------------------------------------------------------------------
> This SF.net email is sponsored by Windows:
>
> Build for Windows Store.
>
> http://p.sf.net/sfu/windows-dev2dev
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev
_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--_f675bcec-9ea1-45b8-b27c-f89eff55463f_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable
<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 12pt;
font-family:Calibri
}
--></style></head>
<body class='hmmessage'><div dir='ltr'>Found the error...I post below. The problem is that metrics.confusion_matrix accept lists and not numpy.array. So I converted everything in list:<div><br></div><div><blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;"><div><div> #Compute the confusion matrix</div><div> y_testlist_tmp = y_test.transpose().tolist()</div><div> y_testlist = y_testlist_tmp[0]</div><div> resultlist = result.tolist()</div><div> cfmat = metrics.confusion_matrix(y_testlist,resultlist)</div></div></blockquote><div><br></div>Thanks All!! If you have any suggestion I'm happy to listen!<br><div><hr id="stopSpelling">From: ***@msn.com<br>To: scikit-learn-***@lists.sourceforge.net<br>Date: Fri, 21 Jun 2013 18:23:49 +0200<br>Subject: Re: [Scikit-learn-general] SVM: select the training set randomly<br><br>
<style><!--
.ExternalClass .ecxhmmessage P {
padding:0px;
}
.ExternalClass body.ecxhmmessage {
font-size:12pt;
font-family:Calibri;
}
--></style>
<div dir="ltr">Ah ok! Yeah, I was thinking that having in my dataset 50/50 (also 40/60) of my dataset for the two classes will be not a problem but since that the ratio is 1/3 I would prefere to have the same distribution for the two, then my choose to use the train_test_split method. I don't know if there are something better but this seems to work :) !!<div><br></div><div>Now I was trying to understand how to get the confusion matrix but I came out with a problem. Below the code (that comes from the code posted in the previous message) and his relative error:</div><div><blockquote style="border:none;padding:0px;"><div><br></div><div> ...</div><div><span style="font-size:12pt;"> #Define a SVM using the best parameters C and gamma</span></div><div> clf = svm.SVC(gamma = clfopt.best_estimator_.gamma, C = clfopt.best_estimator_.C)</div><div> clf.fit(X_train, y_train)</div><div><br></div><div> result = clf.predict(X_test)</div><div> </div><div><span style="font-size:12pt;"> metrics.confusion_matrix(y_test,result)</span></div><div> ...</div><div><br></div><div>Traceback (most recent call last):</div><div> File "<pyshell#101>", line 1, in <module></div><div> metrics.confusion_matrix(y_test,result)</div><div> File "C:\Python27_32\lib\site-packages\sklearn\metrics\metrics.py", line 610, in confusion_matrix</div><div> y_true = np.array([label_to_ind.get(x, n_labels + 1) for x in y_true])</div><div>TypeError: unhashable type: 'numpy.ndarray'</div></blockquote><div><br></div><div>Thanks for your Precious Support!</div><br><div><hr id="ecxstopSpelling">From: ***@gmail.com<br>Date: Fri, 21 Jun 2013 12:11:00 -0400<br>To: scikit-learn-***@lists.sourceforge.net<br>Subject: Re: [Scikit-learn-general] SVM: select the training set randomly<br><br><div dir="ltr">Oh sorry, I was thinking of balanced sets for cross validation, rather than a training and testing split. I don't know of a convenience routine specifically for producing stratified training and testing sets. If both your classes have decent support and the training and testing set sizes aren't too small then you should end up with pretty representative samples anyway. You could check the class balance to make sure they're not too far off. Arguably a slightly different class balance is reasonable anyway if you are trying to check out-of-sample performance.<div>
<br></div><div>-Roban</div><div><br></div></div><div class="ecxgmail_extra"><br><br><div class="ecxgmail_quote">On Fri, Jun 21, 2013 at 11:47 AM, Gianni Iannelli <span dir="ltr"><<a href="mailto:***@msn.com" target="_blank">***@msn.com</a>></span> wrote:<br>
<blockquote class="ecxgmail_quote" style="border-left:1px #ccc solid;padding-left:1ex;">
<div><div dir="ltr"><div class="ecxim"><blockquote style="border:none;padding:0px;"><div><i>StratifiedKFold will keep the class distribution the same for you: </i></div><div><div><i><br></i></div></div><div>
<div><a href="http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html#sklearn.cross_validation.StratifiedKFold" target="_blank"><i>http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html#sklearn.cross_validation.StratifiedKFold</i></a></div>
</div></blockquote><div><br></div></div><div>I was looking at this, it is written:</div><div><br></div><div><blockquote style="border:none;padding:0px;"><div><span style="color:rgb(62,67,73);font-family:Helvetica,Arial,sans-serif;font-size:14px;line-height:21.59375px;"><i>This cross-validation object is a variation of KFold, which returns stratified folds. The folds are made by preserving the percentage of samples for each class.</i></span></div>
<div><span style="color:rgb(62,67,73);font-family:Helvetica,Arial,sans-serif;font-size:14px;line-height:21.59375px;"><i><br></i></span></div></blockquote> But I don't know how he could manage since that I pass to him just the training set and I don't also how to set this percentage for each class. Do I miss something?</div>
<div><br></div><div>I have done one simple test code (see below) where I have my two dataset (class A and class B). I add a cicle for where I select the 20% for each as test and the other 80% as training. I concatenate the train and the test. I scale my training and my test. I found the best C and gamma for my RBF SVM. Train the SVM and use on my test set. The results are in a list with the score values. I think that I'm doing something wrong cause I get my score always 0.5 (in this case I tried always with range(3)).</div>
<div><br></div><div>I think that I will take a look on the metrics that you wrote to me! thanks for that!! Do you think that the StratifiedKFold is better than train_test_split ? Could you see some conceptual mistake on the code below?</div>
<div><br></div><blockquote style="border:none;padding:0px;"><blockquote style="border:none;padding-right:0px;padding-left:0px;">#TEST</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
<br></blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">X_noscaled_A = X_noscaled_A[0:100,:]</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
y_A = y_A[0:100,:]</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">X_noscaled_B = X_noscaled_B[0:100,:]</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
y_B = y_B[0:100,:]</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;"><br></blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
#Define a list for the results</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">scores = list()</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
<br></blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">for i in range(3):</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
#Split keeping the ratio</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;"> X_train_noscal_A, X_test_noscal_A, y_train_A, y_test_A = train_test_split(X_noscaled_A, y_A, test_size = 0.20)</blockquote>
<blockquote style="border:none;padding-right:0px;padding-left:0px;"> X_train_noscal_B, X_test_noscal_B, y_train_B, y_test_B = train_test_split(X_noscaled_B, y_B, test_size = 0.20)</blockquote>
<blockquote style="border:none;padding-right:0px;padding-left:0px;"><br></blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;"> #Concatenate in order to have just one vector for train and one vector for test</blockquote>
<blockquote style="border:none;padding-right:0px;padding-left:0px;"> X_train_noscal = numpy.concatenate((X_train_noscal_A, X_train_noscal_B))</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
y_train = numpy.concatenate((y_train_A,y_train_B))</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;"> X_test_noscal = numpy.concatenate((X_test_noscal_A,X_test_noscal_B))</blockquote>
<blockquote style="border:none;padding-right:0px;padding-left:0px;"> y_test = numpy.concatenate((y_test_A,y_test_B))</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
<br></blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;"> #Scale the training set</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
scaler = preprocessing.StandardScaler().fit(X_train_noscal)</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;"> X_train = scaler.transform(X_train_noscal)</blockquote>
<blockquote style="border:none;padding-right:0px;padding-left:0px;"><br></blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;"> #Scale the test set using the values obtained from the test set</blockquote>
<blockquote style="border:none;padding-right:0px;padding-left:0px;"> X_test = scaler.transform(X_test_noscal)</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
<br></blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;"> #Optimization of C and gamma</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
C_range = 10.0 ** numpy.arange(-3, 7)</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;"> gamma_range = 10.0 ** numpy.arange(-5, 3)</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
param_grid = dict(gamma=gamma_range, C=C_range)</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;"> svr = svm.SVC()</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
clfopt = grid_search.GridSearchCV(svr,param_grid)</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;"> clfopt.fit(X_train, y_train)</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
<br></blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;"> print clfopt.best_estimator_.C</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
print clfopt.best_estimator_.gamma</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;"><br></blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
#Define a SVM using the best parameters C and gamma</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;"> clf = svm.SVC(gamma = clfopt.best_estimator_.gamma, C = clfopt.best_estimator_.C)</blockquote>
<blockquote style="border:none;padding-right:0px;padding-left:0px;"> clf.fit(X_train, y_train)</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
<br></blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;"> #Write the result in the list</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
scores.append(clf.score(X_test,y_test))</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;"><br></blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">
#See the results</blockquote><blockquote style="border:none;padding-right:0px;padding-left:0px;">print scores</blockquote></blockquote><br><div><hr>From: <a href="mailto:***@gmail.com" target="_blank">***@gmail.com</a><br>
Date: Fri, 21 Jun 2013 10:57:58 -0400<div><div class="h5"><br>To: <a href="mailto:scikit-learn-***@lists.sourceforge.net" target="_blank">scikit-learn-***@lists.sourceforge.net</a><br>Subject: Re: [Scikit-learn-general] SVM: select the training set randomly<br>
<br><div dir="ltr">StratifiedKFold will keep the class distribution the same for you: <div><br></div><div><a href="http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html#sklearn.cross_validation.StratifiedKFold" target="_blank">http://scikit-learn.org/stable/modules/generated/sklearn.cross_validation.StratifiedKFold.html#sklearn.cross_validation.StratifiedKFold</a></div>
<div><br></div><div>There are lots of metrics (score functions, etc.) available:</div><div><br></div><div><a href="http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics" target="_blank">http://scikit-learn.org/stable/modules/classes.html#module-sklearn.metrics</a><br>
</div><div><a href="http://scikit-learn.org/stable/modules/model_evaluation.html#model-evaluation" target="_blank">http://scikit-learn.org/stable/modules/model_evaluation.html#model-evaluation</a><br></div><div><br></div>
<div>
See the docs for a particular estimator to find out what the score method returns (which is generally the score function used in optimizing the model). For instance </div><div><br></div><div><a href="http://jaquesgrobler.github.io/Online-Scikit-Learn-stat-tut/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC.score" target="_blank">http://jaquesgrobler.github.io/Online-Scikit-Learn-stat-tut/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC.score</a><br>
</div><div><br></div><div><br></div><div>-Roban</div><div><br></div><div><br>On Fri, Jun 21, 2013 at 10:20 AM, Gianni Iannelli <<a href="mailto:***@msn.com" target="_blank">***@msn.com</a>> wrote:<br>
><br>> Thank You very much for the link!! It does closely what I wanna do!<br>><br>> In my case I have two classes that are for example 0 and 1. I wanna keep the distribution (in the training set and so also the test set) between them similar. And I also need that are choosen randomly, I don't care if in one case I get the same index for training and test set. Well, to select randomly I think that the sklearn.cross_validation.ShuffleSplit() does what I want and I will investigate on that. To keep the distribution equally between the two classes I was thinking to:<br>
><br>> split the two classes<br>> apply for each of them the separation in training and test using the ShuffleSplit()<br>> concatenate now the two classes again (they will have the same size before the split)<br>
> add to one of the two index vector the size of one of the two class (depends how I will concatenate the two)<br>> apply my SVM classification<br>><br>><br>> What do you think? Do you think is ok? <br>><br>
> I have one another question. How score works? What it computes? I searched around but I found this:<br>><br>> sklearn.metrics.classification_report(y_true, y_pred, labels=None, target_names=None)<br>><br>> That maybe could give me back a confusion matrix where I could compute (maybe, I'm just guessing) an Overall Accuracy and a Kappa Coefficient. <br>
><br>> Is it correct?<br>><br>> Thank You Very Much!!!<br>><br>> ________________________________<br>> Date: Fri, 21 Jun 2013 10:59:13 +1000<br>> From: <a href="mailto:***@student.usyd.edu.au" target="_blank">***@student.usyd.edu.au</a><br>
> To: <a href="mailto:scikit-learn-***@lists.sourceforge.net" target="_blank">scikit-learn-***@lists.sourceforge.net</a><br>> Subject: Re: [Scikit-learn-general] SVM: select the training set randomly<br>><br>
><br>> Please see <a href="http://scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html" target="_blank">http://scikit-learn.org/stable/tutorial/statistical_inference/model_selection.html</a><br>
><br>><br>> On Fri, Jun 21, 2013 at 10:31 AM, Gianni Iannelli <<a href="mailto:***@msn.com" target="_blank">***@msn.com</a>> wrote:<br>><br>> Dear All,<br>><br>> I have one question. I have a dataset of 100 vector each with some features. Of this 100 I already know the classification of all of them. What I wanna do is select randomly in this 100 a subset to use as training set and the rest as test set. There is something already implemented in scikit that do it automatically or I have to use an index method? For index method I mean to separate the two classes, for example I have 40 (class A) and 60 (class B) for each of them I select 10 number randomly for each class and put set these 20 vector as training set. After that I select the other 80 vectors (also using the index of the main matrix) and classify. <br>
><br>> Do you think this is too crazy and there is something simple? There is also a validation of the result that could tell me how the classification is good? I know that this is not a real case because I know the classification result but I just wanna see what happens changing the number of features, number of training elements, and so on. <br>
><br>> Thanks All!!!<br>><br>> ------------------------------------------------------------------------------<br>> This SF.net email is sponsored by Windows:<br>><br>> Build for Windows Store.<br>><br>
> <a href="http://p.sf.net/sfu/windows-dev2dev" target="_blank">http://p.sf.net/sfu/windows-dev2dev</a><br>> _______________________________________________<br>> Scikit-learn-general mailing list<br>> <a href="mailto:Scikit-learn-***@lists.sourceforge.net" target="_blank">Scikit-learn-***@lists.sourceforge.net</a><br>
> <a href="https://lists.sourceforge.net/lists/listinfo/scikit-learn-general" target="_blank">https://lists.sourceforge.net/lists/listinfo/scikit-learn-general</a><br>><br>><br>><br>> ------------------------------------------------------------------------------ This SF.net email is sponsored by Windows: Build for Windows Store. <a href="http://p.sf.net/sfu/windows-dev2dev" target="_blank">http://p.sf.net/sfu/windows-dev2dev</a><br>
> _______________________________________________ Scikit-learn-general mailing list <a href="mailto:Scikit-learn-***@lists.sourceforge.net" target="_blank">Scikit-learn-***@lists.sourceforge.net</a> <a href="https://lists.sourceforge.net/lists/listinfo/scikit-learn-general" target="_blank">https://lists.sourceforge.net/lists/listinfo/scikit-learn-general</a><br>
><br>> ------------------------------------------------------------------------------<br>> This SF.net email is sponsored by Windows:<br>><br>> Build for Windows Store.<br>><br>> <a href="http://p.sf.net/sfu/windows-dev2dev" target="_blank">http://p.sf.net/sfu/windows-dev2dev</a><br>
> _______________________________________________<br>> Scikit-learn-general mailing list<br>> <a href="mailto:Scikit-learn-***@lists.sourceforge.net" target="_blank">Scikit-learn-***@lists.sourceforge.net</a><br>
> <a href="https://lists.sourceforge.net/lists/listinfo/scikit-learn-general" target="_blank">https://lists.sourceforge.net/lists/listinfo/scikit-learn-general</a><br>
><br></div></div>
<br>------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
<a href="http://p.sf.net/sfu/windows-dev2dev" target="_blank">http://p.sf.net/sfu/windows-dev2dev</a><br>_______________________________________________
Scikit-learn-general mailing list
<a href="mailto:Scikit-learn-***@lists.sourceforge.net" target="_blank">Scikit-learn-***@lists.sourceforge.net</a>
<a href="https://lists.sourceforge.net/lists/listinfo/scikit-learn-general" target="_blank">https://lists.sourceforge.net/lists/listinfo/scikit-learn-general</a></div></div></div> </div></div>
<br>------------------------------------------------------------------------------<br>
This SF.net email is sponsored by Windows:<br>
<br>
Build for Windows Store.<br>
<br>
<a href="http://p.sf.net/sfu/windows-dev2dev" target="_blank">http://p.sf.net/sfu/windows-dev2dev</a><br>_______________________________________________<br>
Scikit-learn-general mailing list<br>
<a href="mailto:Scikit-learn-***@lists.sourceforge.net">Scikit-learn-***@lists.sourceforge.net</a><br>
<a href="https://lists.sourceforge.net/lists/listinfo/scikit-learn-general" target="_blank">https://lists.sourceforge.net/lists/listinfo/scikit-learn-general</a><br>
<br></blockquote></div><br></div>
<br>------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev<br>_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general</div></div> </div>
<br>------------------------------------------------------------------------------
This SF.net email is sponsored by Windows:
Build for Windows Store.
http://p.sf.net/sfu/windows-dev2dev<br>_______________________________________________
Scikit-learn-general mailing list
Scikit-learn-***@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general</div></div> </div></body>
</html>
--_f675bcec-9ea1-45b8-b27c-f89eff55463f_--