Discussion:
[Scikit-learn-general] Fine tuning parameters of Multi label classification
Startup Hire
2015-12-28 10:38:48 UTC
Permalink
Hi all,

Hope you are doing well.

I am working on fine tuning the following parameters in SGD Classifier
which I am using inside OneVsRest Classifier.

I am using GridSearch to use the same.

I have following questions:


1. How to use GridSearch to optimize OneVsRest Classifier?
2. Any reason why the below code does not work? Error is bad input shape
though the classifier.fit works find separately!






from sklearn.grid_search import GridSearchCV


# Set the parameters by cross-validation

tuned_parameters = [{'alpha': [0.001, 0.01,0.1,0.5] ,
'penalty': ['l1','l2','elasticnet'],
'loss':['log','modified_huber']}]


scores = ['precision', 'recall']

for score in scores:
print("# Tuning hyper-parameters for %s" % score)
print()

clf =
GridSearchCV(SGDClassifier(random_state=0,learning_rate='optimal',class_weight='auto',n_iter=100),
tuned_parameters, cv=5,
scoring='%s_weighted' % score)

clf.fit(Finaldata, y)

print("Best parameters set found on development set:")
print()
print(clf.best_params_)
print()


Regards,
Sanant
Andy
2015-12-29 18:08:20 UTC
Permalink
Hi Sanant.
Please provide the full traceback.

Best,
Andy
Post by Startup Hire
Hi all,
Hope you are doing well.
I am working on fine tuning the following parameters in SGD Classifier
which I am using inside OneVsRest Classifier.
I am using GridSearch to use the same.
1. How to use GridSearch to optimize OneVsRest Classifier?
2. Any reason why the below code does not work? Error is bad input
shape though the classifier.fit works find separately!
from sklearn.grid_search import GridSearchCV
# Set the parameters by cross-validation
tuned_parameters = [{'alpha': [0.001, 0.01,0.1,0.5] ,
'penalty': ['l1','l2','elasticnet'],
'loss':['log','modified_huber']}]
scores = ['precision', 'recall']
print("# Tuning hyper-parameters for %s" % score)
print()
clf =
GridSearchCV(SGDClassifier(random_state=0,learning_rate='optimal',class_weight='auto',n_iter=100),
tuned_parameters, cv=5,
scoring='%s_weighted' % score)
clf.fit(Finaldata, y)
print("Best parameters set found on development set:")
print()
print(clf.best_params_)
print()
Regards,
Sanant
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Startup Hire
2016-01-04 07:15:03 UTC
Permalink
Providing the full StackTrace here:[ code in previous email]

# Tuning hyper-parameters for precision
()

---------------------------------------------------------------------------ValueError
Traceback (most recent call
last)<ipython-input-85-7fedbaf85b7d> in <module>() 18
scoring='%s_weighted' % score) 19 ---> 20
clf.fit(Finaldata, y) 21 22 print("Best parameters set
found on development set:")
D:\Anaconda\lib\site-packages\sklearn\grid_search.pyc in fit(self, X,
y) 730 731 """--> 732 return self._fit(X, y,
ParameterGrid(self.param_grid)) 733 734
D:\Anaconda\lib\site-packages\sklearn\grid_search.pyc in _fit(self, X,
y, parameter_iterable) 503
self.fit_params, return_parameters=True, 504
error_score=self.error_score)--> 505 for
parameters in parameter_iterable 506 for train,
test in cv) 507
D:\Anaconda\lib\site-packages\sklearn\externals\joblib\parallel.pyc in
__call__(self, iterable) 657 self._iterating = True
658 for function, args, kwargs in iterable:--> 659
self.dispatch(function, args, kwargs) 660 661
if pre_dispatch == "all" or n_jobs == 1:
D:\Anaconda\lib\site-packages\sklearn\externals\joblib\parallel.pyc in
dispatch(self, func, args, kwargs) 404 """ 405
if self._pool is None:--> 406 job = ImmediateApply(func,
args, kwargs) 407 index = len(self._jobs) 408
if not _verbosity_filter(index, self.verbose):
D:\Anaconda\lib\site-packages\sklearn\externals\joblib\parallel.pyc in
__init__(self, func, args, kwargs) 138 # Don't delay the
application, to avoid keeping the input 139 # arguments in
memory--> 140 self.results = func(*args, **kwargs) 141
142 def get(self):
D:\Anaconda\lib\site-packages\sklearn\cross_validation.pyc in
_fit_and_score(estimator, X, y, scorer, train, test, verbose,
parameters, fit_params, return_train_score, return_parameters,
error_score) 1457 estimator.fit(X_train, **fit_params)
1458 else:-> 1459 estimator.fit(X_train, y_train,
**fit_params) 1460 1461 except Exception as e:
D:\Anaconda\lib\site-packages\sklearn\linear_model\stochastic_gradient.pyc
in fit(self, X, y, coef_init, intercept_init, class_weight,
sample_weight) 562 loss=self.loss,
learning_rate=self.learning_rate, 563
coef_init=coef_init, intercept_init=intercept_init,--> 564
sample_weight=sample_weight) 565 566
D:\Anaconda\lib\site-packages\sklearn\linear_model\stochastic_gradient.pyc
in _fit(self, X, y, alpha, C, loss, learning_rate, coef_init,
intercept_init, sample_weight) 401 self.classes_ = None
402 --> 403 X, y = check_X_y(X, y, 'csr', dtype=np.float64,
order="C") 404 n_samples, n_features = X.shape 405
D:\Anaconda\lib\site-packages\sklearn\utils\validation.pyc in
check_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite,
ensure_2d, allow_nd, multi_output, ensure_min_samples,
ensure_min_features, y_numeric) 447
dtype=None) 448 else:--> 449 y = column_or_1d(y,
warn=True) 450 _assert_all_finite(y) 451 if
y_numeric and y.dtype.kind == 'O':
D:\Anaconda\lib\site-packages\sklearn\utils\validation.pyc in
column_or_1d(y, warn) 483 return np.ravel(y) 484 --> 485
raise ValueError("bad input shape {0}".format(shape)) 486
487
ValueError: bad input shape (914551, 6)
Post by Startup Hire
Hi all,
Hope you are doing well.
I am working on fine tuning the following parameters in SGD Classifier
which I am using inside OneVsRest Classifier.
I am using GridSearch to use the same.
1. How to use GridSearch to optimize OneVsRest Classifier?
2. Any reason why the below code does not work? Error is bad input
shape though the classifier.fit works find separately!
from sklearn.grid_search import GridSearchCV
# Set the parameters by cross-validation
tuned_parameters = [{'alpha': [0.001, 0.01,0.1,0.5] ,
'penalty': ['l1','l2','elasticnet'],
'loss':['log','modified_huber']}]
scores = ['precision', 'recall']
print("# Tuning hyper-parameters for %s" % score)
print()
clf =
GridSearchCV(SGDClassifier(random_state=0,learning_rate='optimal',class_weight='auto',n_iter=100),
tuned_parameters, cv=5,
scoring='%s_weighted' % score)
clf.fit(Finaldata, y)
print("Best parameters set found on development set:")
print()
print(clf.best_params_)
print()
Regards,
Sanant
Andreas Mueller
2016-01-04 18:50:47 UTC
Permalink
You didn't use a OneVsRestClassifier. SGDClassifier itself can only do
multi-class, not multi-label.
It needs to be GridSearchCV(OneVsRestClassifier(SGDClassifier()), ...)
Post by Startup Hire
Providing the full StackTrace here:[ code in previous email]
# Tuning hyper-parameters for precision
()
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-85-7fedbaf85b7d> in<module>() 18 scoring='%s_weighted' % score) 19
---> 20clf.fit(Finaldata, y)
21
22 print("Best parameters set found on development set:")
D:\Anaconda\lib\site-packages\sklearn\grid_search.pyc infit(self, X, y) 730
731 """
--> 732return self._fit(X, y, ParameterGrid(self.param_grid))
733
734
D:\Anaconda\lib\site-packages\sklearn\grid_search.pyc in_fit(self, X, y, parameter_iterable) 503 self.fit_params, return_parameters=True,
504 error_score=self.error_score)
--> 505for parameters in parameter_iterable
506 for train, test in cv)
507
D:\Anaconda\lib\site-packages\sklearn\externals\joblib\parallel.pyc in__call__(self, iterable) 657 self._iterating= True
--> 659self.dispatch(function, args, kwargs)
660
--> 406job = ImmediateApply(func, args, kwargs)
407 index= len(self._jobs)
D:\Anaconda\lib\site-packages\sklearn\externals\joblib\parallel.pyc in__init__(self, func, args, kwargs) 138 # Don't delay the application,
to avoid keeping the input
139 # arguments in memory
--> 140self.results= func(*args, **kwargs)
141
D:\Anaconda\lib\site-packages\sklearn\cross_validation.pyc in_fit_and_score(estimator, X, y, scorer, train, test, verbose,
parameters, fit_params, return_train_score, return_parameters,
error_score) 1457 estimator.fit(X_train, **fit_params)
-> 1459estimator.fit(X_train, y_train, **fit_params)
1460
D:\Anaconda\lib\site-packages\sklearn\linear_model\stochastic_gradient.pyc infit(self, X, y, coef_init, intercept_init, class_weight,
sample_weight) 562 loss=self.loss, learning_rate=self.learning_rate,
563 coef_init=coef_init, intercept_init=intercept_init,
--> 564sample_weight=sample_weight) 565
566
D:\Anaconda\lib\site-packages\sklearn\linear_model\stochastic_gradient.pyc in_fit(self, X, y, alpha, C, loss, learning_rate, coef_init,
intercept_init, sample_weight) 401 self.classes_= None
402
--> 403X, y= check_X_y(X, y, 'csr', dtype=np.float64, order="C")
404 n_samples, n_features= X.shape
405
D:\Anaconda\lib\site-packages\sklearn\utils\validation.pyc incheck_X_y(X, y, accept_sparse, dtype, order, copy, force_all_finite,
ensure_2d, allow_nd, multi_output, ensure_min_samples,
--> 449y = column_or_1d(y, warn=True)
450 _assert_all_finite(y)
D:\Anaconda\lib\site-packages\sklearn\utils\validation.pyc incolumn_or_1d(y, warn) 483 return np.ravel(y)
484
--> 485raise ValueError("bad input shape {0}".format(shape))
486
487
ValueError: bad input shape (914551, 6)
On Mon, Dec 28, 2015 at 4:08 PM, Startup Hire
Hi all,
Hope you are doing well.
I am working on fine tuning the following parameters in SGD
Classifier which I am using inside OneVsRest Classifier.
I am using GridSearch to use the same.
1. How to use GridSearch to optimize OneVsRest Classifier?
2. Any reason why the below code does not work? Error is bad
input shape though the classifier.fit works find separately!
from sklearn.grid_search import GridSearchCV
# Set the parameters by cross-validation
tuned_parameters = [{'alpha': [0.001, 0.01,0.1,0.5] ,
'penalty': ['l1','l2','elasticnet'],
'loss':['log','modified_huber']}]
scores = ['precision', 'recall']
print("# Tuning hyper-parameters for %s" % score)
print()
clf =
GridSearchCV(SGDClassifier(random_state=0,learning_rate='optimal',class_weight='auto',n_iter=100),
tuned_parameters, cv=5,
scoring='%s_weighted' % score)
clf.fit(Finaldata, y)
print("Best parameters set found on development set:")
print()
print(clf.best_params_)
print()
Regards,
Sanant
------------------------------------------------------------------------------
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Loading...