Discussion:
Different values on C on grid searches for sparse Linear SVC
(too old to reply)
Denzil Correa
2011-05-18 03:49:34 UTC
Permalink
Is it okay to get different values of C on different grid searches?

* X = sparse_feats
Y = target_labels

folds = StratifiedKFold(Y, cross_fold, indices=True)
train, test = iter(StratifiedKFold(Y, 2, indices = True)).next()

# Generate grid search values for C, gamma
C_val = 2. ** np.arange(C_start, C_end + C_step, C_step)
gamma_val = 2. ** np.arange(gamma_start, gamma_end + gamma_step,
gamma_step)

print C_val
print gamma_val

grid_clf = svm.sparse.LinearSVC()

print grid_clf

linear_SVC_params = {'C': C_val}

grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs = 10,
iid = False, score_func = f1_score)

grid_search.fit(X[train], Y[train], cv=StratifiedKFold(Y[train], 10))
y_true, y_pred = Y[test], grid_search.predict(X[test])

print grid_search.best_estimator
print "Best score: %0.3f" % grid_search.best_score

print "Best parameters set:"
best_parameters = grid_search.best_estimator._get_params()
for param_name in sorted(linear_SVC_params.keys()):
print "\t%s: %r" % (param_name, best_parameters[param_name])

** clf = svm.sparse.LinearSVC(C = best_parameters['C'])

I get a different C on each grid search. Is this normal?*
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Denzil Correa
2011-05-18 04:43:33 UTC
Permalink
Just to be more explicit, I get different *best estimators *on different
runs of the same data set. I vary C from 2^-5, 2^-3, ... , 2^15 as suggested
in the libSVM guide. Though I observe that the best accuracy between all
runs is for C = 2^5, why should the value of C vary across runs?

One immediate point comes across the mind would be the highlighted line. Is
it that it creates a different train,test for each run and hence the
difference? I may be wrong but I'm just thinking loud here.
Post by Denzil Correa
Is it okay to get different values of C on different grid searches?
* X = sparse_feats
Y = target_labels
folds = StratifiedKFold(Y, cross_fold, indices=True)
train, test = iter(StratifiedKFold(Y, 2, indices = True)).next()
# Generate grid search values for C, gamma
C_val = 2. ** np.arange(C_start, C_end + C_step, C_step)
gamma_val = 2. ** np.arange(gamma_start, gamma_end + gamma_step,
gamma_step)
print C_val
print gamma_val
grid_clf = svm.sparse.LinearSVC()
print grid_clf
linear_SVC_params = {'C': C_val}
grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs = 10,
iid = False, score_func = f1_score)
grid_search.fit(X[train], Y[train], cv=StratifiedKFold(Y[train], 10))
y_true, y_pred = Y[test], grid_search.predict(X[test])
print grid_search.best_estimator
print "Best score: %0.3f" % grid_search.best_score
print "Best parameters set:"
best_parameters = grid_search.best_estimator._get_params()
print "\t%s: %r" % (param_name, best_parameters[param_name])
** clf = svm.sparse.LinearSVC(C = best_parameters['C'])
I get a different C on each grid search. Is this normal?*
--
Regards,
Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Denzil Correa
2011-05-18 06:14:47 UTC
Permalink
The below is certainly NOT the problem.* *I divided the numpy array into two
halves (manually) and I still observe different C values on different runs
of my program and hence, leading to different accuracy.

* grid_search.fit(X[0:168], Y[0:168], cv=StratifiedKFold(Y[0:168], 10))
y_true, y_pred = Y[169:337], grid_search.predict(X[169:337])

*

Why would a grid search return different values on different results? I
tried the Parameter Estimation using Nested Cross
Validation<http://scikit-learn.sourceforge.net/auto_examples/grid_search_digits.html>on
the digits database as given in the documentation section. Though the
example is simple, I don't observe different parameter values on different
runs of the program. I am curious as to why this happens in my case.
Post by Denzil Correa
Just to be more explicit, I get different *best estimators *on different
runs of the same data set. I vary C from 2^-5, 2^-3, ... , 2^15 as suggested
in the libSVM guide. Though I observe that the best accuracy between all
runs is for C = 2^5, why should the value of C vary across runs?
One immediate point comes across the mind would be the highlighted line. Is
it that it creates a different train,test for each run and hence the
difference? I may be wrong but I'm just thinking loud here.
Post by Denzil Correa
Is it okay to get different values of C on different grid searches?
* X = sparse_feats
Y = target_labels
folds = StratifiedKFold(Y, cross_fold, indices=True)
train, test = iter(StratifiedKFold(Y, 2, indices = True)).next()
# Generate grid search values for C, gamma
C_val = 2. ** np.arange(C_start, C_end + C_step, C_step)
gamma_val = 2. ** np.arange(gamma_start, gamma_end + gamma_step,
gamma_step)
print C_val
print gamma_val
grid_clf = svm.sparse.LinearSVC()
print grid_clf
linear_SVC_params = {'C': C_val}
grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs = 10,
iid = False, score_func = f1_score)
grid_search.fit(X[train], Y[train], cv=StratifiedKFold(Y[train], 10))
y_true, y_pred = Y[test], grid_search.predict(X[test])
print grid_search.best_estimator
print "Best score: %0.3f" % grid_search.best_score
print "Best parameters set:"
best_parameters = grid_search.best_estimator._get_params()
print "\t%s: %r" % (param_name, best_parameters[param_name])
** clf = svm.sparse.LinearSVC(C = best_parameters['C'])
I get a different C on each grid search. Is this normal?*
--
Regards,
Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
--
Regards,
Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Fabian Pedregosa
2011-05-18 07:54:26 UTC
Permalink
The below is certainly NOT the problem. I divided the numpy array into two
halves (manually) and I still observe different C values on different runs
of my program and hence, leading to different accuracy.
    grid_search.fit(X[0:168], Y[0:168], cv=StratifiedKFold(Y[0:168], 10))
    y_true, y_pred = Y[169:337], grid_search.predict(X[169:337])
check the scores in grid_search.grid_scores_ , if they are really
close it might be a tolerance issue since Liblinear have a random
component -- results might very slightly between runs.

Best,

Fabian.
Denzil Correa
2011-05-18 08:55:40 UTC
Permalink
Fabian:


Here's an output of *grid_search.grid_scores_* on two different runs :

[({'C': 0.125}, 6.0369991310494751),
({'C': 0.5}, 6.4282326369282892),
({'C': 2.0}, 5.8368660287081333),
({'C': 8.0}, 6.6337626141973978),
({'C': 32.0}, 6.7475067416243899),
({'C': 128.0}, 6.1230158730158735),
({'C': 512.0}, 5.8600977113253325),
({'C': 2048.0}, 4.9142135159526452),
({'C': 8192.0}, 4.832273040968694),
({'C': 32768.0}, 6.1615997552947341)]


[({'C': 0.125}, 6.0369991310494751),
({'C': 0.5}, 6.4477884434406176),
({'C': 2.0}, 6.5285595761083712),
({'C': 8.0}, 6.5920254085215726),
({'C': 32.0}, 6.3402787068004463),
({'C': 128.0}, 6.0862874082439307),
({'C': 512.0}, 5.8568265068265069),
({'C': 2048.0}, 5.6382127099504951),
({'C': 8192.0}, 5.2125797618124992),
({'C': 32768.0}, 5.8029681912290609)]

I believe this looks more or less like a tolerance issue as you pointed out.
The unfortunate part though is that it affects the final accuracy of my
classifier. Probably, I should plot accuracies versus 'C' graph and select
the one with the best results?
Post by Fabian Pedregosa
The below is certainly NOT the problem. I divided the numpy array into
two
halves (manually) and I still observe different C values on different
runs
of my program and hence, leading to different accuracy.
grid_search.fit(X[0:168], Y[0:168], cv=StratifiedKFold(Y[0:168], 10))
y_true, y_pred = Y[169:337], grid_search.predict(X[169:337])
check the scores in grid_search.grid_scores_ , if they are really
close it might be a tolerance issue since Liblinear have a random
component -- results might very slightly between runs.
Best,
Fabian.
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Olivier Grisel
2011-05-18 09:04:05 UTC
Permalink
Post by Denzil Correa
[({'C': 0.125}, 6.0369991310494751),
 ({'C': 0.5}, 6.4282326369282892),
 ({'C': 2.0}, 5.8368660287081333),
 ({'C': 8.0}, 6.6337626141973978),
 ({'C': 32.0}, 6.7475067416243899),
 ({'C': 128.0}, 6.1230158730158735),
 ({'C': 512.0}, 5.8600977113253325),
 ({'C': 2048.0}, 4.9142135159526452),
 ({'C': 8192.0}, 4.832273040968694),
 ({'C': 32768.0}, 6.1615997552947341)]
[({'C': 0.125}, 6.0369991310494751),
 ({'C': 0.5}, 6.4477884434406176),
 ({'C': 2.0}, 6.5285595761083712),
 ({'C': 8.0}, 6.5920254085215726),
 ({'C': 32.0}, 6.3402787068004463),
 ({'C': 128.0}, 6.0862874082439307),
 ({'C': 512.0}, 5.8568265068265069),
 ({'C': 2048.0}, 5.6382127099504951),
 ({'C': 8192.0}, 5.2125797618124992),
 ({'C': 32768.0}, 5.8029681912290609)]
I believe this looks more or less like a tolerance issue as you pointed out.
The unfortunate part though is that it affects the final accuracy of my
classifier.
Indeed, maybe using more extensive cross validation (10-folds or
LeaveOneOut instead of 5-folds) might decrease the variance of the
estimate and get more stable results.
Post by Denzil Correa
Probably, I should plot accuracies versus 'C' graph and select
the one with the best results?
This is exactly what grid search is doing. You can change the score
function if the default one is not the one that you are looking for (I
don't remember which one it is). Have a look at the metrics package:

https://github.com/scikit-learn/scikit-learn/blob/master/scikits/learn/metrics/metrics.py
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Denzil Correa
2011-05-18 09:13:02 UTC
Permalink
Olivier:

I am performing Stratified 10-fold cross validation. I am not sure of the
default function but I have over ridden it to use F1_score from the metrics
package. I still get those variations on different program runs. Here's a
list of variations for the same feature set:

C, Accuracy
32, 60.68
2048, 58.03
0.125, 63.34
128, 64.8
0.5, 62.1
2, 58.9
512, 57.72

Just in case, I am posting my code snippet once again. Please let me know if
I am going wrong somewhere.

X = sparse_feats
Y = target_labels

folds = StratifiedKFold(Y, cross_fold, indices=True)
train, test = iter(StratifiedKFold(Y, 2, indices = True)).next()

# Generate grid search values for C, gamma
C_val = 2. ** np.arange(C_start, C_end + C_step, C_step)
gamma_val = 2. ** np.arange(gamma_start, gamma_end + gamma_step,
gamma_step)

print C_val
print gamma_val

grid_clf = svm.sparse.LinearSVC()
# grid_clf = svm.sparse.SVC(kernel = 'linear', tol = 0.0001)

print grid_clf
sparse_SVC_params = {'C': C_val, 'gamma' : gamma_val}
linear_SVC_params = {'C': C_val}

grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs = 10,
iid = False, score_func = f1_score)
grid_search.fit(X[train], Y[train], cv=StratifiedKFold(Y[train], 10))
y_true, y_pred = Y[test],
grid_search.predict(X[test])

print "Classification report for the best estimator: "
print grid_search.best_estimator

print "Tuned for with optimal value: %0.3f" % f1_score(y_true, y_pred)
print classification_report(y_true, y_pred)

print "Grid scores:"
pprint(grid_search.grid_scores_)

print "Best score: %0.3f" % grid_search.best_score

best_parameters = grid_search.best_estimator._get_params()


accuracies = []
hate_precisions = []
hate_recalls = []
hate_f1s = []
plate_precisions = []
plate_recalls = []
plate_f1s = []


## clf = svm.sparse.SVC(kernel = 'linear', C = best_parameters['C'],
gamma = best_parameters['gamma'])
clf = svm.sparse.LinearSVC(C = best_parameters['C'])
print clf

for i, (train,test) in enumerate(folds):
# Train the model
clf.fit(X[train], Y[train])

# Predict the values in the test class
Y_pred = clf.predict(X[test])

# Confusion Matrix
conf_mat = confusion_matrix(Y[test], Y_pred)
# print conf_mat

prec,rec,f1,sup = precision_recall_fscore_support(Y[test], Y_pred)
# print prec,rec,f1,sup

tp = conf_mat[0][0] + conf_mat[1][1]
total = tp + conf_mat[0][1] + conf_mat[1][0]
accuracy = (tp/float(total))

accuracies.append(accuracy)

hate_precisions.append(prec[target_names.index('hate')])
plate_precisions.append(prec[target_names.index('plate')])

hate_recalls.append(rec[target_names.index('hate')])
plate_recalls.append(rec[target_names.index('plate')])

hate_f1s.append(f1[target_names.index('hate')])
plate_f1s.append(f1[target_names.index('plate')])

print 'Accuracy Mean : ', sum(accuracies) / float(cross_fold)
print 'Accuracy Variance: ', float(array(accuracies).var())

print 'Hate Precision Mean:', sum(hate_precisions) / float(cross_fold)
print 'Hate Precision Variance:', float(array(hate_precisions).var())

print 'Hate Recall Mean: ', sum(hate_recalls) / float(cross_fold)
print 'Hate Recall Variance: ', float(array(hate_recalls).var())

print 'Hate F-Measure Mean: ', sum(hate_f1s) / float(cross_fold)
print 'Hate F-Measure Variance: ', float(array(hate_f1s).var())

print 'Plate Precision Mean: ', sum(plate_precisions) /
float(cross_fold)
print 'Plate Precision Variance: ', float(array(plate_precisions).var())

print 'Plate Recall Mean:', sum(plate_recalls) / float(cross_fold)
print 'Plate Recall Variance: ', float(array(plate_recalls).var())

print 'Plate F-Measure Mean: ', (sum(plate_f1s)) / float(cross_fold)
print 'Plate F-Measure Variance: ', float(array(plate_f1s).var())
Post by Olivier Grisel
Post by Denzil Correa
[({'C': 0.125}, 6.0369991310494751),
({'C': 0.5}, 6.4282326369282892),
({'C': 2.0}, 5.8368660287081333),
({'C': 8.0}, 6.6337626141973978),
({'C': 32.0}, 6.7475067416243899),
({'C': 128.0}, 6.1230158730158735),
({'C': 512.0}, 5.8600977113253325),
({'C': 2048.0}, 4.9142135159526452),
({'C': 8192.0}, 4.832273040968694),
({'C': 32768.0}, 6.1615997552947341)]
[({'C': 0.125}, 6.0369991310494751),
({'C': 0.5}, 6.4477884434406176),
({'C': 2.0}, 6.5285595761083712),
({'C': 8.0}, 6.5920254085215726),
({'C': 32.0}, 6.3402787068004463),
({'C': 128.0}, 6.0862874082439307),
({'C': 512.0}, 5.8568265068265069),
({'C': 2048.0}, 5.6382127099504951),
({'C': 8192.0}, 5.2125797618124992),
({'C': 32768.0}, 5.8029681912290609)]
I believe this looks more or less like a tolerance issue as you pointed
out.
Post by Denzil Correa
The unfortunate part though is that it affects the final accuracy of my
classifier.
Indeed, maybe using more extensive cross validation (10-folds or
LeaveOneOut instead of 5-folds) might decrease the variance of the
estimate and get more stable results.
Post by Denzil Correa
Probably, I should plot accuracies versus 'C' graph and select
the one with the best results?
This is exactly what grid search is doing. You can change the score
function if the default one is not the one that you are looking for (I
https://github.com/scikit-learn/scikit-learn/blob/master/scikits/learn/metrics/metrics.py
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Olivier Grisel
2011-05-18 09:45:16 UTC
Permalink
Post by Denzil Correa
I am performing Stratified 10-fold cross validation. I am not sure of the
default function but I have over ridden it to use F1_score from the metrics
package. I still get those variations on different program runs. Here's a
C, Accuracy
32, 60.68
2048, 58.03
0.125, 63.34
128, 64.8
0.5, 62.1
2, 58.9
512, 57.72
This is indeed very weird. Can you try with the (SGDClassifier, alpha)
pair instead of (LinearSVC, C) to check whether you observe the same
phenomenon?

Maybe this issue comes from the fact that you dataset is too noisy to
be able to train any good linear model on it. 0.62 of F1 score is weak
for text classification. Maybe you should invest some time in either
testing more advanced features (bi-grams, collocations...) or collect
more data. How big is your dataset? How many dimensions do you have in
your feature space?
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Denzil Correa
2011-05-18 10:22:16 UTC
Permalink
Post by Olivier Grisel
Post by Denzil Correa
I am performing Stratified 10-fold cross validation. I am not sure of the
default function but I have over ridden it to use F1_score from the
metrics
Post by Denzil Correa
package. I still get those variations on different program runs. Here's a
C, Accuracy
32, 60.68
2048, 58.03
0.125, 63.34
128, 64.8
0.5, 62.1
2, 58.9
512, 57.72
This is indeed very weird. Can you try with the (SGDClassifier, alpha)
pair instead of (LinearSVC, C) to check whether you observe the same
phenomenon?
I did this. It doesn't result in variations. I vary alpha from 10^-5 to
10^-15. I hope this interval makes sense? I get the best alpha parameter as
10^-5 and accuracy as and accuracy as 53%.
Post by Olivier Grisel
Maybe this issue comes from the fact that you dataset is too noisy to
be able to train any good linear model on it. 0.62 of F1 score is weak
for text classification.
Yes my data is noisy(consists of informal text, spelling errors etc.) and I
am aware of the same. May I correct you that 0.62 is not the F-score but the
"accuracy" of my classifier after tuning the parameters viz. after
performing a grid search I obtain the *best* C and use it for my
classification task.
Post by Olivier Grisel
Maybe you should invest some time in either
testing more advanced features (bi-grams, collocations...) or collect
more data. How big is your dataset? How many dimensions do you have in
your feature space?
Currently, I have encoded lexical and syntactic features. I have
purposefully stayed away from adding content based features like
unigrams,bigrams, etc. I am incrementally adding feature sets and checking
my classifier accuracy to check how addition of features affect
classification performance. My dataset consists of 338 instances and 572
features as of now.
Post by Olivier Grisel
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Olivier Grisel
2011-05-18 12:37:03 UTC
Permalink
Post by Denzil Correa
Post by Olivier Grisel
Post by Denzil Correa
I am performing Stratified 10-fold cross validation. I am not sure of the
default function but I have over ridden it to use F1_score from the metrics
package. I still get those variations on different program runs. Here's a
C, Accuracy
32, 60.68
2048, 58.03
0.125, 63.34
128, 64.8
0.5, 62.1
2, 58.9
512, 57.72
This is indeed very weird. Can you try with the (SGDClassifier, alpha)
pair instead of (LinearSVC, C) to check whether you observe the same
phenomenon?
I did this. It doesn't result in variations. I vary alpha from 10^-5 to
10^-15. I hope this interval makes sense? I get the best alpha parameter as
10^-5 and accuracy as and accuracy as 53%.
If the best score is on the boundary of your range you should extend
your range on that side. For instance 10^-1 to 10^-6.
Post by Denzil Correa
Post by Olivier Grisel
Maybe this issue comes from the fact that you dataset is too noisy to
be able to train any good linear model on it. 0.62 of F1 score is weak
for text classification.
Maybe you should invest some time in either
testing more advanced features (bi-grams, collocations...) or collect
more data. How big is your dataset? How many dimensions do you have in
your feature space?
Currently, I have encoded lexical and syntactic features. I have
purposefully stayed away from adding content based features like
unigrams,bigrams, etc. I am incrementally adding feature sets and checking
my classifier accuracy to check how addition of features affect
classification performance. My dataset consists of 338 instances and 572
features as of now.
For such small problem you can try a Gaussian kernel SVC that should
improve the f1 score if the data is not linearly separable (grid
search on both C and gamma).

And you should definitely try to collect more samples. Let say 1k per
class (e.g. for binary classification, positive versus negative try to
collect 2k samples). And also try to extract more features.

I don't think it's worth investing in building smart classifiers for
text classification problems. You will probably get much better
results by throwing more data at your model.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Alexandre Gramfort
2011-05-18 13:54:28 UTC
Permalink
Hi,

this is not that weird. Try to reduce the convergence tol and the
variance should disappear.
The reason is that liblinear has a random part and the smaller is the
regularization ie le larger
is C the longer is takes for the algo to converge. This would be fixed
with a warm restart on the
logistic path ...

Alex

On Wed, May 18, 2011 at 5:45 AM, Olivier Grisel
Post by Olivier Grisel
Post by Denzil Correa
I am performing Stratified 10-fold cross validation. I am not sure of the
default function but I have over ridden it to use F1_score from the metrics
package. I still get those variations on different program runs. Here's a
C, Accuracy
32, 60.68
2048, 58.03
0.125, 63.34
128, 64.8
0.5, 62.1
2, 58.9
512, 57.72
This is indeed very weird. Can you try with the (SGDClassifier, alpha)
pair instead of (LinearSVC, C) to check whether you observe the same
phenomenon?
Maybe this issue comes from the fact that you dataset is too noisy to
be able to train any good linear model on it. 0.62 of F1 score is weak
for text classification. Maybe you should invest some time in either
testing more advanced features (bi-grams, collocations...) or collect
more data. How big is your dataset? How many dimensions do you have in
your feature space?
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Denzil Correa
2011-05-18 14:14:52 UTC
Permalink
Alexandre:

I am using the tolerance parameter as 0.0001 for libLinear which is the
default setting. How much further should I reduce?

On Wed, May 18, 2011 at 7:24 PM, Alexandre Gramfort <
Post by Alexandre Gramfort
Hi,
this is not that weird. Try to reduce the convergence tol and the
variance should disappear.
The reason is that liblinear has a random part and the smaller is the
regularization ie le larger
is C the longer is takes for the algo to converge. This would be fixed
with a warm restart on the
logistic path ...
Alex
On Wed, May 18, 2011 at 5:45 AM, Olivier Grisel
Post by Olivier Grisel
Post by Denzil Correa
I am performing Stratified 10-fold cross validation. I am not sure of
the
Post by Olivier Grisel
Post by Denzil Correa
default function but I have over ridden it to use F1_score from the
metrics
Post by Olivier Grisel
Post by Denzil Correa
package. I still get those variations on different program runs. Here's
a
Post by Olivier Grisel
Post by Denzil Correa
C, Accuracy
32, 60.68
2048, 58.03
0.125, 63.34
128, 64.8
0.5, 62.1
2, 58.9
512, 57.72
This is indeed very weird. Can you try with the (SGDClassifier, alpha)
pair instead of (LinearSVC, C) to check whether you observe the same
phenomenon?
Maybe this issue comes from the fact that you dataset is too noisy to
be able to train any good linear model on it. 0.62 of F1 score is weak
for text classification. Maybe you should invest some time in either
testing more advanced features (bi-grams, collocations...) or collect
more data. How big is your dataset? How many dimensions do you have in
your feature space?
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Post by Olivier Grisel
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Alexandre Gramfort
2011-05-18 14:19:26 UTC
Permalink
Post by Denzil Correa
I am using the tolerance parameter as 0.0001 for libLinear which is the
default setting. How much further should I reduce?
It will depend on how big is your C. A good test is to check that the
performance
does not depend anymore on this parameter.

to illustrate this, try to run

http://scikit-learn.sourceforge.net/auto_examples/linear_model/plot_logistic_path.html

(I used tol=1e-6) with the default tol and you will see a much less nicer path.

we need a warm restart !

Alex
Denzil Correa
2011-05-19 14:29:49 UTC
Permalink
grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs = 100,
score_func = f1_score)

This variation seems to be due to the n_jobs number. If I keep it 10 it
varies, if kept 100 it doesn't!

Can anyone throw light on why this is taking place? The documentation
suggests that this variable specifies only the number of jobs to be run in
parallel and I hence assume it's for efficiency of the program.

I would like to open an issue on the git.

On Wed, May 18, 2011 at 7:49 PM, Alexandre Gramfort <
Post by Alexandre Gramfort
Post by Denzil Correa
I am using the tolerance parameter as 0.0001 for libLinear which is the
default setting. How much further should I reduce?
It will depend on how big is your C. A good test is to check that the
performance
does not depend anymore on this parameter.
to illustrate this, try to run
http://scikit-learn.sourceforge.net/auto_examples/linear_model/plot_logistic_path.html
(I used tol=1e-6) with the default tol and you will see a much less nicer path.
we need a warm restart !
Alex
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Alexandre Gramfort
2011-05-19 15:37:14 UTC
Permalink
Hi Denzil,

can you provide some code to reproduce the pb?

thanks

Alex
    grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs = 100,
score_func = f1_score)
This variation seems to be due to the n_jobs number. If I keep it 10 it
varies, if kept 100 it doesn't!
Can anyone throw light on why this is taking place? The documentation
suggests that this variable specifies only the number of jobs to be run in
parallel and I hence assume it's for efficiency of the program.
I would like to open an issue on the git.
On Wed, May 18, 2011 at 7:49 PM, Alexandre Gramfort
Post by Alexandre Gramfort
Post by Denzil Correa
I am using the tolerance parameter as 0.0001 for libLinear which is the
default setting. How much further should I reduce?
It will depend on how big is your C. A good test is to check that the
performance
does not depend anymore on this parameter.
to illustrate this, try to run
http://scikit-learn.sourceforge.net/auto_examples/linear_model/plot_logistic_path.html
(I used tol=1e-6) with the default tol and you will see a much less nicer path.
we need a warm restart !
Alex
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,
Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Denzil Correa
2011-05-19 15:51:15 UTC
Permalink
Alexandre:

This is already posted more than once in this thread but providing once
again for convenience. :-)

X = sparse_feats
Y = target_labels
C_start, C_end, C_step = -3, 15, 2

train, test = iter(StratifiedKFold(Y, 2, indices = True)).next()

# Generate grid search values for C, gamma
C_val = 2. ** np.arange(C_start, C_end + C_step, C_step)


grid_clf = svm.sparse.LinearSVC()
print grid_clf

linear_SVC_params = {'C': C_val}

grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs = 100 ,
score_func = f1_score)
grid_search.fit(X[train], Y[train], cv = StratifiedKFold(Y[train],10,
indices = True))
y_true, y_pred = Y[test],
grid_search.predict(X[test])

print "Classification report for the best estimator: "
print grid_search.best_estimator

print "Tuned for with optimal value: %0.3f" % f1_score(y_true, y_pred)
print classification_report(y_true, y_pred)

print "Grid scores:"
pprint(grid_search.grid_scores_)

print "Best score: %0.3f" % grid_search.best_score


best_parameters = grid_search.best_estimator._get_params()
print "Best C: %0.3f " % best_parameters['C']

Just to make sure you can re-produce, X is a sparse CSR matrix of features.



On Thu, May 19, 2011 at 9:07 PM, Alexandre Gramfort <
Post by Alexandre Gramfort
Hi Denzil,
can you provide some code to reproduce the pb?
thanks
Alex
Post by Denzil Correa
grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs =
100,
Post by Denzil Correa
score_func = f1_score)
This variation seems to be due to the n_jobs number. If I keep it 10 it
varies, if kept 100 it doesn't!
Can anyone throw light on why this is taking place? The documentation
suggests that this variable specifies only the number of jobs to be run
in
Post by Denzil Correa
parallel and I hence assume it's for efficiency of the program.
I would like to open an issue on the git.
On Wed, May 18, 2011 at 7:49 PM, Alexandre Gramfort
Post by Alexandre Gramfort
Post by Denzil Correa
I am using the tolerance parameter as 0.0001 for libLinear which is
the
Post by Denzil Correa
Post by Alexandre Gramfort
Post by Denzil Correa
default setting. How much further should I reduce?
It will depend on how big is your C. A good test is to check that the
performance
does not depend anymore on this parameter.
to illustrate this, try to run
http://scikit-learn.sourceforge.net/auto_examples/linear_model/plot_logistic_path.html
Post by Denzil Correa
Post by Alexandre Gramfort
(I used tol=1e-6) with the default tol and you will see a much less
nicer
Post by Denzil Correa
Post by Alexandre Gramfort
path.
we need a warm restart !
Alex
------------------------------------------------------------------------------
Post by Denzil Correa
Post by Alexandre Gramfort
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,
Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
------------------------------------------------------------------------------
Post by Denzil Correa
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Denzil Correa
2011-05-19 15:54:17 UTC
Permalink
I have also opened up an issue on git :
https://github.com/scikit-learn/scikit-learn/issues/177 and posted the code.
Post by Denzil Correa
This is already posted more than once in this thread but providing once
again for convenience. :-)
X = sparse_feats
Y = target_labels
C_start, C_end, C_step = -3, 15, 2
train, test = iter(StratifiedKFold(Y, 2, indices = True)).next()
# Generate grid search values for C, gamma
C_val = 2. ** np.arange(C_start, C_end + C_step, C_step)
grid_clf = svm.sparse.LinearSVC()
print grid_clf
linear_SVC_params = {'C': C_val}
grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs = 100 ,
score_func = f1_score)
grid_search.fit(X[train], Y[train], cv = StratifiedKFold(Y[train],10,
indices = True))
y_true, y_pred = Y[test],
grid_search.predict(X[test])
print "Classification report for the best estimator: "
print grid_search.best_estimator
print "Tuned for with optimal value: %0.3f" % f1_score(y_true, y_pred)
print classification_report(y_true, y_pred)
print "Grid scores:"
pprint(grid_search.grid_scores_)
print "Best score: %0.3f" % grid_search.best_score
best_parameters = grid_search.best_estimator._get_params()
print "Best C: %0.3f " % best_parameters['C']
Just to make sure you can re-produce, X is a sparse CSR matrix of features.
On Thu, May 19, 2011 at 9:07 PM, Alexandre Gramfort <
Post by Alexandre Gramfort
Hi Denzil,
can you provide some code to reproduce the pb?
thanks
Alex
Post by Denzil Correa
grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs =
100,
Post by Denzil Correa
score_func = f1_score)
This variation seems to be due to the n_jobs number. If I keep it 10 it
varies, if kept 100 it doesn't!
Can anyone throw light on why this is taking place? The documentation
suggests that this variable specifies only the number of jobs to be run
in
Post by Denzil Correa
parallel and I hence assume it's for efficiency of the program.
I would like to open an issue on the git.
On Wed, May 18, 2011 at 7:49 PM, Alexandre Gramfort
Post by Alexandre Gramfort
Post by Denzil Correa
I am using the tolerance parameter as 0.0001 for libLinear which is
the
Post by Denzil Correa
Post by Alexandre Gramfort
Post by Denzil Correa
default setting. How much further should I reduce?
It will depend on how big is your C. A good test is to check that the
performance
does not depend anymore on this parameter.
to illustrate this, try to run
http://scikit-learn.sourceforge.net/auto_examples/linear_model/plot_logistic_path.html
Post by Denzil Correa
Post by Alexandre Gramfort
(I used tol=1e-6) with the default tol and you will see a much less
nicer
Post by Denzil Correa
Post by Alexandre Gramfort
path.
we need a warm restart !
Alex
------------------------------------------------------------------------------
Post by Denzil Correa
Post by Alexandre Gramfort
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,
Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
------------------------------------------------------------------------------
Post by Denzil Correa
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,
Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Alexandre Gramfort
2011-05-19 16:23:48 UTC
Permalink
Post by Denzil Correa
This is already posted more than once in this thread but providing once
again for convenience. :-)
for convenience can you provide a self contained example with a gist on github
that I can just run to see the pb :)

Alex
Denzil Correa
2011-05-19 17:34:16 UTC
Permalink
Alexandre :

I have opened up an issue in Git :
https://github.com/scikit-learn/scikit-learn/issues/177

Let me know if this doesn't suffice and you require more information
from my side.
Post by Alexandre Gramfort
Post by Denzil Correa
This is already posted more than once in this thread but providing once
again for convenience. :-)
for convenience can you provide a self contained example with a gist on github
that I can just run to see the pb :)
Alex
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Olivier Grisel
2011-05-19 18:21:23 UTC
Permalink
Post by Denzil Correa
https://github.com/scikit-learn/scikit-learn/issues/177
Let me know if this doesn't suffice and you require more information
from my side.
I would be great to have a complete script along with the values for
the data variables. Can you reproduce it only with your data or with
randomly generated data too? If it is only with your dataset, can you
put it online somewhere and provide the code that loads it into the
sparse matrix?
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Denzil Correa
2011-05-21 18:23:25 UTC
Permalink
Olivier:

I am posting a link to the CSV of the NumPy array dump (dense format though
since I need to reshape it) . The problem persists irrespective of the
sparse/dense version.

You may download the file from
here<https://docs.google.com/leaf?id=0B0GLJLxdKPLqNWNhZTk2NjAtNjhlOS00OTAyLTk5MzItZTkzYWE0ZjlhY2Rj&hl=en_US&authkey=COuh8JgN>.
The last column maps to the target_label.

On Thu, May 19, 2011 at 11:51 PM, Olivier Grisel
Post by Olivier Grisel
Post by Denzil Correa
https://github.com/scikit-learn/scikit-learn/issues/177
Let me know if this doesn't suffice and you require more information
from my side.
I would be great to have a complete script along with the values for
the data variables. Can you reproduce it only with your data or with
randomly generated data too? If it is only with your dataset, can you
put it online somewhere and provide the code that loads it into the
sparse matrix?
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Alexandre Gramfort
2011-05-22 12:54:47 UTC
Permalink
Hi Denzil,

I gave a try to your script with your data and I confirm that there is something
weird. I do observe a large variance in the results when using LinearSVC
even with tol=1e-10. However with a logistic regression everything works fine
(same result with dense and sparse versions + independent of n_jobs).
Unless there is something I do not understand with LinearSVC I suspect
a bug.

see :

https://gist.github.com/985437

to reproduce the pb.

Alex
Post by Denzil Correa
I am posting a link to the CSV of the NumPy array dump (dense format though
since I need to reshape it) . The problem persists irrespective of the
sparse/dense version.
You may download the file from here. The last column maps to the
target_label.
Post by Olivier Grisel
Post by Denzil Correa
https://github.com/scikit-learn/scikit-learn/issues/177
Let me know if this doesn't suffice and you require more information
from my side.
I would be great to have a complete script along with the values for
the data variables. Can you reproduce it only with your data or with
randomly generated data too? If it is only with your dataset, can you
put it online somewhere and provide the code that loads it into the
sparse matrix?
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,
Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Denzil Correa
2011-05-22 13:26:32 UTC
Permalink
Alexandre :

Thanks! I too see that LogisticRegression works pretty fine.

I am surprised, to say the least. I just found out two bugs in about 7 days
of using scikits-learn. :-)

Anyways, till this isn't sorted out I am forced to use the command line
libLinear utility directly.


On Sun, May 22, 2011 at 6:24 PM, Alexandre Gramfort <
Post by Alexandre Gramfort
Hi Denzil,
I gave a try to your script with your data and I confirm that there is something
weird. I do observe a large variance in the results when using LinearSVC
even with tol=1e-10. However with a logistic regression everything works fine
(same result with dense and sparse versions + independent of n_jobs).
Unless there is something I do not understand with LinearSVC I suspect
a bug.
https://gist.github.com/985437
to reproduce the pb.
Alex
Post by Denzil Correa
I am posting a link to the CSV of the NumPy array dump (dense format
though
Post by Denzil Correa
since I need to reshape it) . The problem persists irrespective of the
sparse/dense version.
You may download the file from here. The last column maps to the
target_label.
On Thu, May 19, 2011 at 11:51 PM, Olivier Grisel <
Post by Olivier Grisel
Post by Denzil Correa
https://github.com/scikit-learn/scikit-learn/issues/177
Let me know if this doesn't suffice and you require more information
from my side.
I would be great to have a complete script along with the values for
the data variables. Can you reproduce it only with your data or with
randomly generated data too? If it is only with your dataset, can you
put it online somewhere and provide the code that loads it into the
sparse matrix?
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Post by Denzil Correa
Post by Olivier Grisel
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,
Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
------------------------------------------------------------------------------
Post by Denzil Correa
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Denzil Correa
2011-05-22 20:52:37 UTC
Permalink
Alexandre :

I would also like to report that when I use sparse/dense SVC the grid search
never converges and the programs grows out of memory to freeze my computer.
Could you also reproduce the same by performing a grid search on a SVC for
the parameters C and gamma but restricting to a linear kernel?

Basically changes include :

grid_clf = svm.sparse.SVC(kernel = 'linear' , tol = 0.0001)

sparse_SVC_params = {'C': C_val, 'gamma': gamma_val}



Is this in any way connected to scaling of data?


On Sun, May 22, 2011 at 6:24 PM, Alexandre Gramfort <
Post by Alexandre Gramfort
Hi Denzil,
I gave a try to your script with your data and I confirm that there is something
weird. I do observe a large variance in the results when using LinearSVC
even with tol=1e-10. However with a logistic regression everything works fine
(same result with dense and sparse versions + independent of n_jobs).
Unless there is something I do not understand with LinearSVC I suspect
a bug.
https://gist.github.com/985437
to reproduce the pb.
Alex
Post by Denzil Correa
I am posting a link to the CSV of the NumPy array dump (dense format
though
Post by Denzil Correa
since I need to reshape it) . The problem persists irrespective of the
sparse/dense version.
You may download the file from here. The last column maps to the
target_label.
On Thu, May 19, 2011 at 11:51 PM, Olivier Grisel <
Post by Olivier Grisel
Post by Denzil Correa
https://github.com/scikit-learn/scikit-learn/issues/177
Let me know if this doesn't suffice and you require more information
from my side.
I would be great to have a complete script along with the values for
the data variables. Can you reproduce it only with your data or with
randomly generated data too? If it is only with your dataset, can you
put it online somewhere and provide the code that loads it into the
sparse matrix?
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Post by Denzil Correa
Post by Olivier Grisel
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,
Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
------------------------------------------------------------------------------
Post by Denzil Correa
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Olivier Grisel
2011-05-22 20:58:23 UTC
Permalink
Post by Denzil Correa
I would also like to report that when I use sparse/dense SVC the grid search
never converges and the programs grows out of memory to freeze my computer.
Could you also reproduce the same by performing a grid search on a SVC for
the parameters C and gamma but restricting to a linear kernel?
grid_clf = svm.sparse.SVC(kernel = 'linear' , tol = 0.0001)
sparse_SVC_params = {'C': C_val, 'gamma': gamma_val}
Is this in any way connected to scaling of data?
Please include the complete program, in particular the values for
C_val and gamma_val and the training data if you can make them
available.

Also 'gamma' is not used by the linear kernel hence it's a waste of
time to do a grid search over it when kernel = 'linear'. gamma is the
'width' of the RBF kernel.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Denzil Correa
2011-05-22 21:01:55 UTC
Permalink
Olivier :

The changes are in line with the program posted by Alexandre on git. I have
already provided the program, data and values on this thread as well as the
issue opened on the scikits-learn git.

Pardon my ignorance on the gamma but the non-convergence still remains. Does
this have anything remotely to do with scaling of data ?
Post by Denzil Correa
Post by Denzil Correa
I would also like to report that when I use sparse/dense SVC the grid
search
Post by Denzil Correa
never converges and the programs grows out of memory to freeze my
computer.
Post by Denzil Correa
Could you also reproduce the same by performing a grid search on a SVC
for
Post by Denzil Correa
the parameters C and gamma but restricting to a linear kernel?
grid_clf = svm.sparse.SVC(kernel = 'linear' , tol = 0.0001)
sparse_SVC_params = {'C': C_val, 'gamma': gamma_val}
Is this in any way connected to scaling of data?
Please include the complete program, in particular the values for
C_val and gamma_val and the training data if you can make them
available.
Also 'gamma' is not used by the linear kernel hence it's a waste of
time to do a grid search over it when kernel = 'linear'. gamma is the
'width' of the RBF kernel.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Paolo Losi
2011-05-22 21:08:54 UTC
Permalink
Post by Denzil Correa
Pardon my ignorance on the gamma but the non-convergence still remains. Does
this have anything remotely to do with scaling of data ?
Definitely yes. I should scale your data.

You can gather more explanation on:

- libsvm guide [1]
- svc scikits tips [2]

[1] http://www.csie.ntu.edu.tw/~cjlin/papers/guide/guide.pdf
[2] http://scikit-learn.sourceforge.net/modules/svm.html#tips-on-practical-use
Olivier Grisel
2011-05-22 21:16:19 UTC
Permalink
Post by Denzil Correa
The changes are in line with the program posted by Alexandre on git. I have
already provided the program, data and values on this thread as well as the
issue opened on the scikits-learn git.
Sorry, I missed that email for some reason. Indeed you should scale
your data as the ranges are wildly varying.
Post by Denzil Correa
Pardon my ignorance on the gamma but the non-convergence still remains. Does
this have anything remotely to do with scaling of data ?
That should not be the case. I think you are experimenting an issue
with many copies of the dataset in memory. Are you under windows? If
so it's very possible that joblib.Parallel is suboptimal in that case.
Just use it with n_jobs=1 and the memory should not explode.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Denzil Correa
2011-05-22 21:23:12 UTC
Permalink
Post by Olivier Grisel
Post by Denzil Correa
The changes are in line with the program posted by Alexandre on git. I
have
Post by Denzil Correa
already provided the program, data and values on this thread as well as
the
Post by Denzil Correa
issue opened on the scikits-learn git.
Sorry, I missed that email for some reason. Indeed you should scale
your data as the ranges are wildly varying.
Would it affect the sparseness of my data as you mentioned in the other
thread ? I sometimes deal with very high dimensional but sparse matrices.
Post by Olivier Grisel
Post by Denzil Correa
Pardon my ignorance on the gamma but the non-convergence still remains.
Does
Post by Denzil Correa
this have anything remotely to do with scaling of data ?
That should not be the case. I think you are experimenting an issue
with many copies of the dataset in memory. Are you under windows? If
so it's very possible that joblib.Parallel is suboptimal in that case.
Just use it with n_jobs=1 and the memory should not explode.
I am a Mac/Linux user. If I keep the jobs = 1, though the memory never
explodes the *fit *still doesn't converge. Why should that be?
Post by Olivier Grisel
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Alexandre Gramfort
2011-05-22 21:37:38 UTC
Permalink
Denzil,

every gist is a git repo. Rather than pasting code on the mailing list
why not cloning
the gist that I took the time to put up this morning? We're willing to help but
it's better if we can review the entire script and run it instantly. We might be
able to spot a pb just by reading the code, but for this we need so
see the entire script
you're using.

Alex
Post by Denzil Correa
Post by Olivier Grisel
Post by Denzil Correa
The changes are in line with the program posted by Alexandre on git. I have
already provided the program, data and values on this thread as well as the
issue opened on the scikits-learn git.
Sorry, I missed that email for some reason. Indeed you should scale
your data as the ranges are wildly varying.
Would it affect the sparseness of my data as you mentioned in the other
thread ? I sometimes deal with very high dimensional but sparse matrices.
Post by Olivier Grisel
Post by Denzil Correa
Pardon my ignorance on the gamma but the non-convergence still remains. Does
this have anything remotely to do with scaling of data ?
That should not be the case. I think you are experimenting an issue
with many copies of the dataset in memory. Are you under windows? If
so it's very possible that joblib.Parallel is suboptimal in that case.
Just use it with n_jobs=1 and the memory should not explode.
I am a Mac/Linux user. If I keep the jobs = 1, though the memory never
explodes the fit still doesn't converge. Why should that be?
Post by Olivier Grisel
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,
Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Gael Varoquaux
2011-05-22 21:49:50 UTC
Permalink
Post by Denzil Correa
Post by Olivier Grisel
That should not be the case. I think you are experimenting an issue
with many copies of the dataset in memory. Are you under windows? If
so it's very possible that joblib.Parallel is suboptimal in that case.
Just use it with n_jobs=1 and the memory should not explode.
I am a Mac/Linux user. If I keep the jobs = 1, though the memory never
explodes the *fit *still doesn't converge. Why should that be?
The reason that the memory explodes is that job spawner creates all the
folds before spawning them. Thus, as you have many grid points, a lot of
temporary data is created. I know how to fix that. I'll try to have a
look tomorrow.

G
Olivier Grisel
2011-05-22 22:09:44 UTC
Permalink
Post by Gael Varoquaux
Post by Denzil Correa
Post by Olivier Grisel
That should not be the case. I think you are experimenting an issue
with many copies of the dataset in memory. Are you under windows? If
so it's very possible that joblib.Parallel is suboptimal in that case.
Just use it with n_jobs=1 and the memory should not explode.
I am a Mac/Linux user. If I keep the jobs = 1, though the memory never
explodes the *fit *still doesn't converge. Why should that be?
The reason that the memory explodes is that job spawner creates all the
folds before spawning them. Thus, as you have many grid points, a lot of
temporary data is created. I know how to fix that. I'll try to have a
look tomorrow.
Yes indeed we already had that discussion in the past. Looking forward
to how you will fix it.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2011-05-23 22:10:58 UTC
Permalink
Post by Gael Varoquaux
Post by Denzil Correa
Post by Olivier Grisel
That should not be the case. I think you are experimenting an issue
with many copies of the dataset in memory. Are you under windows? If
so it's very possible that joblib.Parallel is suboptimal in that case.
Just use it with n_jobs=1 and the memory should not explode.
I am a Mac/Linux user. If I keep the jobs = 1, though the memory never
explodes the *fit *still doesn't converge. Why should that be?
The reason that the memory explodes is that job spawner creates all the
folds before spawning them. Thus, as you have many grid points, a lot of
temporary data is created. I know how to fix that. I'll try to have a
look tomorrow.
Sorry it took so long. I fixed this in:
https://github.com/scikit-learn/scikit-learn/commit/384fdb0b7002c7f5d8033c73bd9e21c702e307e0

If you run into large memory usage using grid_search, keep in mind the
new 'pre_dispatch' attribute.

Gael
Denzil Correa
2011-05-23 19:52:56 UTC
Permalink
Hi Alexandre :


The issue still exists. I have performed a grid search using nested
cross-fold validation and scaled my data as asked in other threads.

I have opened up a gist on git hub as suggested by you here :
https://gist.github.com/987394/e1b7c0eae14d4c642088e0ca9e73f43efb95ab7d

The classifier accuracy varies from 53% for libSVM to 71% for libLinear
which is very much on the higher side. What could be the reason? I am very
much curious why would the difference be so large.

On Thu, May 19, 2011 at 9:53 PM, Alexandre Gramfort <
Post by Alexandre Gramfort
Post by Denzil Correa
This is already posted more than once in this thread but providing once
again for convenience. :-)
for convenience can you provide a self contained example with a gist on github
that I can just run to see the pb :)
Alex
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Alexandre Gramfort
2011-05-24 01:53:31 UTC
Permalink
hi,

as said earlier both classifiers do not optimize exactly the same cost function
(penalization of intercept, diff handling of multiclass, square hinge
loss for LinearSVC)
hence should produce different results. However the % is indeed fairly
different.
I don't really know how to explain it, besides that a high variance of
the estimate.
Is this difference reproducible across different folds?

Alex
The issue still exists.  I have performed a grid search using nested
cross-fold validation and scaled my data as asked in other threads.
I have opened up a gist on git hub as suggested by you here
: https://gist.github.com/987394/e1b7c0eae14d4c642088e0ca9e73f43efb95ab7d
The classifier accuracy varies from 53% for libSVM to 71% for libLinear
which is very much on the higher side. What could be the reason? I am very
much curious why would the difference be so large.
On Thu, May 19, 2011 at 9:53 PM, Alexandre Gramfort
Post by Alexandre Gramfort
Post by Denzil Correa
This is already posted more than once in this thread but providing once
again for convenience. :-)
for convenience can you provide a self contained example with a gist on github
that I can just run to see the pb :)
Alex
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,
Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Denzil Correa
2011-05-24 04:12:47 UTC
Permalink
Alexandre :

I understand the point that the cost function is different but the accuracy
as you said is too high for any liking.

You mean different folds for reporting accuracy or different folds for the
grid search? In any case, I think cross-validation should alleviate the
same.

I have opened an issue on the git :
https://github.com/scikit-learn/scikit-learn/issues/187

In any case, which classifier accuracy should I now report? Sparse SVC or
LinearSVC ?

On Tue, May 24, 2011 at 7:23 AM, Alexandre Gramfort <
Post by Alexandre Gramfort
hi,
as said earlier both classifiers do not optimize exactly the same cost function
(penalization of intercept, diff handling of multiclass, square hinge
loss for LinearSVC)
hence should produce different results. However the % is indeed fairly
different.
I don't really know how to explain it, besides that a high variance of
the estimate.
Is this difference reproducible across different folds?
Alex
Post by Denzil Correa
The issue still exists. I have performed a grid search using nested
cross-fold validation and scaled my data as asked in other threads.
I have opened up a gist on git hub as suggested by you here
https://gist.github.com/987394/e1b7c0eae14d4c642088e0ca9e73f43efb95ab7d
Post by Denzil Correa
The classifier accuracy varies from 53% for libSVM to 71% for libLinear
which is very much on the higher side. What could be the reason? I am
very
Post by Denzil Correa
much curious why would the difference be so large.
On Thu, May 19, 2011 at 9:53 PM, Alexandre Gramfort
Post by Alexandre Gramfort
Post by Denzil Correa
This is already posted more than once in this thread but providing
once
Post by Denzil Correa
Post by Alexandre Gramfort
Post by Denzil Correa
again for convenience. :-)
for convenience can you provide a self contained example with a gist on github
that I can just run to see the pb :)
Alex
------------------------------------------------------------------------------
Post by Denzil Correa
Post by Alexandre Gramfort
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,
Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Peter Prettenhofer
2011-05-26 09:31:41 UTC
Permalink
Hi all,

I've commented on the issue in Github but I'd like to spread my
findings to a broader audience. As far as I can see both issues ([1]
and [2]) are merely a problem with the provided dataset. IMHO due to
the ill-conditioned dataset, we get random results (due to random
initialization?). The min standard deviation in the dataset is about
0.5, the max is about 70377!!
If you standardize your data (e.g. using Scaler) prior to model
fitting, you'll get the expected results.

I've pushed all code to reproduce the issues onto my scikit-learn fork:
https://github.com/pprett/scikit-learn/tree/njobs-bug

The two scripts which are relevant are:
njobsbug.py [1]
linearsvc_vs_svc.py [2]

[1] https://github.com/scikit-learn/scikit-learn/issues/177
[2] https://github.com/scikit-learn/scikit-learn/issues/187

best,
Peter
In any case, which classifier accuracy should I now report? Sparse SVC or LinearSVC ?
Well... this depends on your research ethics. I'd go for the better one.
I understand the point that the cost function is different but the accuracy
as you said is too high for any liking.
You mean different folds for reporting accuracy or different folds for the
grid search? In any case, I think cross-validation should alleviate the
same.
https://github.com/scikit-learn/scikit-learn/issues/187
In any case, which classifier accuracy should I now report? Sparse SVC or
LinearSVC ?
On Tue, May 24, 2011 at 7:23 AM, Alexandre Gramfort
Post by Alexandre Gramfort
hi,
as said earlier both classifiers do not optimize exactly the same cost function
(penalization of intercept, diff handling of multiclass, square hinge
loss for LinearSVC)
hence should produce different results. However the % is indeed fairly
different.
I don't really know how to explain it, besides that a high variance of
the estimate.
Is this difference reproducible across different folds?
Alex
The issue still exists.  I have performed a grid search using nested
cross-fold validation and scaled my data as asked in other threads.
I have opened up a gist on git hub as suggested by you here
: https://gist.github.com/987394/e1b7c0eae14d4c642088e0ca9e73f43efb95ab7d
The classifier accuracy varies from 53% for libSVM to 71% for libLinear
which is very much on the higher side. What could be the reason? I am very
much curious why would the difference be so large.
On Thu, May 19, 2011 at 9:53 PM, Alexandre Gramfort
Post by Alexandre Gramfort
Post by Denzil Correa
This is already posted more than once in this thread but providing once
again for convenience. :-)
for convenience can you provide a self contained example with a gist on github
that I can just run to see the pb :)
Alex
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,
Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
--
Regards,
Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
------------------------------------------------------------------------------
vRanger cuts backup time in half-while increasing security.
With the market-leading solution for virtual backup and recovery,
you get blazing-fast, flexible, and affordable data protection.
Download your free trial now.
http://p.sf.net/sfu/quest-d2dcopy1
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Peter Prettenhofer
Olivier Grisel
2011-05-26 13:45:42 UTC
Permalink
Post by Peter Prettenhofer
Hi all,
I've commented on the issue in Github but I'd like to spread my
findings to a broader audience. As far as I can see both issues ([1]
and [2]) are merely a problem with the provided dataset. IMHO due to
the ill-conditioned dataset, we get random results (due to random
initialization?). The min standard deviation in the dataset is about
0.5, the max is about 70377!!
If you standardize your data (e.g. using Scaler) prior to model
fitting, you'll get the expected results.
https://github.com/pprett/scikit-learn/tree/njobs-bug
njobsbug.py [1]
linearsvc_vs_svc.py [2]
[1] https://github.com/scikit-learn/scikit-learn/issues/177
[2] https://github.com/scikit-learn/scikit-learn/issues/187
Thanks for the detective work. I have added the following issue:

https://github.com/scikit-learn/scikit-learn/issues/189

To make it trivial to scale sparse matrices to unit variance without
centering in scikit learn.

We could also add a LogScaler transformer that takes the np.log1p(X)
of positive data that follow a fat tail distribution with feature
values spanning several order of magnitude. I know this is trivial to
implement but adding it to the scikit would make it intuitive to use
it in a preprocessing pipeline.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Gael Varoquaux
2011-05-26 17:02:38 UTC
Permalink
Post by Peter Prettenhofer
I've commented on the issue in Github but I'd like to spread my
findings to a broader audience. As far as I can see both issues ([1]
and [2]) are merely a problem with the provided dataset. IMHO due to
the ill-conditioned dataset, we get random results (due to random
initialization?). The min standard deviation in the dataset is about
0.5, the max is about 70377!!
Thanks heaps for investigating this. I was expecting something like this.
Post by Peter Prettenhofer
If you standardize your data (e.g. using Scaler) prior to model
fitting, you'll get the expected results.
Should we add a note (in big red, using the 'warning' directive)
stressing the importance to normalize.
Post by Peter Prettenhofer
https://github.com/pprett/scikit-learn/tree/njobs-bug
njobsbug.py
linearsvc_vs_svc.py
And I am willing to bet that you spent quite a while on this.

Ideally, in such situation, the bug reporter should provide
fully-runnable script that display the problem. And I mean
_fully-runnable_, with imports, data loading, and explicit displaying of
the problem.

The reason I say this is that between Mayavi and scikit-learn I could
easily employ myself full time turning users' ill-defined problems into
code that I can dig into.

Now the challenge is to communicate this process well.

Gael

Gael Varoquaux
2011-05-19 19:44:49 UTC
Permalink
Post by Denzil Correa
grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs = 100,
score_func = f1_score)
This variation seems to be due to the n_jobs number. If I keep it 10 it
varies, if kept 100 it doesn't!
I believe that this is not a bug. As already stated, liblinear depends on
a random initialization to converge. Different numbers of jobs will use
different streams of random number generation. As a result, they will
lead to different results. Having a uniform random number generation
would require patching liblinear to do message passing between the
different jobs. I don't think that this is an option

I believe that you have already seen that your learning problem is very
sensitive to initialization, so I am not terribly surprised.

This seems to indicate that you have either little information in your
data, or a classifier that is not suited to your problem.

HTH,

Gaël
Olivier Grisel
2011-05-19 19:54:37 UTC
Permalink
Post by Gael Varoquaux
   grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs = 100,
   score_func = f1_score)
   This variation seems to be due to the n_jobs number. If I keep it 10 it
   varies, if kept 100 it doesn't!
I believe that this is not a bug. As already stated, liblinear depends on
a random initialization to converge. Different numbers of jobs will use
different streams of random number generation. As a result, they will
lead to different results. Having a uniform random number generation
would require patching liblinear to do message passing between the
different jobs. I don't think that this is an option
I believe that you have already seen that your learning problem is very
sensitive to initialization, so I am not terribly surprised.
That does not explain why n_jobs=100 stabilizes the results while
n_jobs=10 does not, does it? With n_jobs=100 most of the jobs should
not be used and there should be the same number of multiprocessing
tasks performed hence the behavior should be the same.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
Alexandre Gramfort
2011-05-19 20:06:36 UTC
Permalink
Denzil,

you did not change the tol as I suggested.

Re run your script with :

grid_clf = svm.sparse.LinearSVC(tol=1e-6)

to have a convergence that depends less on the random state.

Alex

On Thu, May 19, 2011 at 3:54 PM, Olivier Grisel
Post by Olivier Grisel
Post by Gael Varoquaux
   grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs = 100,
   score_func = f1_score)
   This variation seems to be due to the n_jobs number. If I keep it 10 it
   varies, if kept 100 it doesn't!
I believe that this is not a bug. As already stated, liblinear depends on
a random initialization to converge. Different numbers of jobs will use
different streams of random number generation. As a result, they will
lead to different results. Having a uniform random number generation
would require patching liblinear to do message passing between the
different jobs. I don't think that this is an option
I believe that you have already seen that your learning problem is very
sensitive to initialization, so I am not terribly surprised.
That does not explain why n_jobs=100 stabilizes the results while
n_jobs=10 does not, does it? With n_jobs=100 most of the jobs should
not be used and there should be the same number of multiprocessing
tasks performed hence the behavior should be the same.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Denzil Correa
2011-05-21 18:12:21 UTC
Permalink
Alexandre :

I tried this. It doesn't make any difference. The variations still exist.

On Fri, May 20, 2011 at 1:36 AM, Alexandre Gramfort <
Post by Alexandre Gramfort
Denzil,
you did not change the tol as I suggested.
grid_clf = svm.sparse.LinearSVC(tol=1e-6)
to have a convergence that depends less on the random state.
Alex
On Thu, May 19, 2011 at 3:54 PM, Olivier Grisel
Post by Olivier Grisel
Post by Gael Varoquaux
Post by Denzil Correa
grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs =
100,
Post by Olivier Grisel
Post by Gael Varoquaux
Post by Denzil Correa
score_func = f1_score)
This variation seems to be due to the n_jobs number. If I keep it 10
it
Post by Olivier Grisel
Post by Gael Varoquaux
Post by Denzil Correa
varies, if kept 100 it doesn't!
I believe that this is not a bug. As already stated, liblinear depends
on
Post by Olivier Grisel
Post by Gael Varoquaux
a random initialization to converge. Different numbers of jobs will use
different streams of random number generation. As a result, they will
lead to different results. Having a uniform random number generation
would require patching liblinear to do message passing between the
different jobs. I don't think that this is an option
I believe that you have already seen that your learning problem is very
sensitive to initialization, so I am not terribly surprised.
That does not explain why n_jobs=100 stabilizes the results while
n_jobs=10 does not, does it? With n_jobs=100 most of the jobs should
not be used and there should be the same number of multiprocessing
tasks performed hence the behavior should be the same.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Post by Olivier Grisel
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Denzil Correa
2011-05-21 18:13:45 UTC
Permalink
To add, also tried with tol = 1e-10.

The variations still exist.

On Fri, May 20, 2011 at 1:36 AM, Alexandre Gramfort <
Post by Alexandre Gramfort
Denzil,
you did not change the tol as I suggested.
grid_clf = svm.sparse.LinearSVC(tol=1e-6)
to have a convergence that depends less on the random state.
Alex
On Thu, May 19, 2011 at 3:54 PM, Olivier Grisel
Post by Olivier Grisel
Post by Gael Varoquaux
Post by Denzil Correa
grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs =
100,
Post by Olivier Grisel
Post by Gael Varoquaux
Post by Denzil Correa
score_func = f1_score)
This variation seems to be due to the n_jobs number. If I keep it 10
it
Post by Olivier Grisel
Post by Gael Varoquaux
Post by Denzil Correa
varies, if kept 100 it doesn't!
I believe that this is not a bug. As already stated, liblinear depends
on
Post by Olivier Grisel
Post by Gael Varoquaux
a random initialization to converge. Different numbers of jobs will use
different streams of random number generation. As a result, they will
lead to different results. Having a uniform random number generation
would require patching liblinear to do message passing between the
different jobs. I don't think that this is an option
I believe that you have already seen that your learning problem is very
sensitive to initialization, so I am not terribly surprised.
That does not explain why n_jobs=100 stabilizes the results while
n_jobs=10 does not, does it? With n_jobs=100 most of the jobs should
not be used and there should be the same number of multiprocessing
tasks performed hence the behavior should be the same.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
Post by Olivier Grisel
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Denzil Correa
2011-05-21 18:17:23 UTC
Permalink
I am with Olivier on this. The parallelization of jobs is affecting the
factor a bit too much.
Post by Olivier Grisel
Post by Gael Varoquaux
Post by Denzil Correa
grid_search = GridSearchCV(grid_clf , linear_SVC_params, n_jobs =
100,
Post by Gael Varoquaux
Post by Denzil Correa
score_func = f1_score)
This variation seems to be due to the n_jobs number. If I keep it 10
it
Post by Gael Varoquaux
Post by Denzil Correa
varies, if kept 100 it doesn't!
I believe that this is not a bug. As already stated, liblinear depends on
a random initialization to converge. Different numbers of jobs will use
different streams of random number generation. As a result, they will
lead to different results. Having a uniform random number generation
would require patching liblinear to do message passing between the
different jobs. I don't think that this is an option
I believe that you have already seen that your learning problem is very
sensitive to initialization, so I am not terribly surprised.
That does not explain why n_jobs=100 stabilizes the results while
n_jobs=10 does not, does it? With n_jobs=100 most of the jobs should
not be used and there should be the same number of multiprocessing
tasks performed hence the behavior should be the same.
--
Olivier
http://twitter.com/ogrisel - http://github.com/ogrisel
------------------------------------------------------------------------------
What Every C/C++ and Fortran developer Should Know!
Read this article and learn how Intel has extended the reach of its
next-generation tools to help Windows* and Linux* C/C++ and Fortran
developers boost performance applications - including clusters.
http://p.sf.net/sfu/intel-dev2devmay
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
--
Regards,

Denzil Correa
Ph.D Scholar
Indraprastha Institute of Information Technology, Delhi
http://www.iiitd.ac.in/
Continue reading on narkive:
Loading...