Fabian Böhnlein
2016-02-05 14:27:44 UTC
Hi all,
hopefully a simple question:
Imputer on axis=0 drops four columns where the first element is nan even
though not all of the values of these columns are nan.
I can't reproduce with a minimal example. Any ideas what could go wrong?
Please see below the code I used to debug.
Thanks,
Fabian
X.shape
(385186, 223)
imp = Imputer(axis=0, verbose=5)
imp.fit(X)
Imputer(axis=0, copy=True, missing_values='NaN', strategy='mean',
verbose=5)
X[:,72]
array([nan, 0.0166205 , 0.00619835, ..., 0.00189036, 0.00788955,
0.00378583])
X[:,73]
array([nan, 0.31578947, 0.13636364, ..., 0.08695652, 0.30769231,
0.1627907 ])
X.shape
(385186, 223)
np.isnan(X).all(axis=0).any()
False
X_imputed = imp.transform(X)
/home/user/anaconda/lib/python2.7/site-packages/sklearn/preprocessing/imputation.py:347:
UserWarning: Deleting features without observed values: [ 72 73 131
132]
"observed values: %s" % missing)
X_imputed.shape
(385186, 219)
# There is also nans in other columns
np.isnan(X).any(axis=0).sum()
7
np.isnan(X).any(axis=1).sum()
107181
np.isnan(X).all(axis=0).sum()
0
np.isnan(X).all(axis=1).sum()
0
hopefully a simple question:
Imputer on axis=0 drops four columns where the first element is nan even
though not all of the values of these columns are nan.
I can't reproduce with a minimal example. Any ideas what could go wrong?
Please see below the code I used to debug.
Thanks,
Fabian
X.shape
(385186, 223)
imp = Imputer(axis=0, verbose=5)
imp.fit(X)
Imputer(axis=0, copy=True, missing_values='NaN', strategy='mean',
verbose=5)
X[:,72]
array([nan, 0.0166205 , 0.00619835, ..., 0.00189036, 0.00788955,
0.00378583])
X[:,73]
array([nan, 0.31578947, 0.13636364, ..., 0.08695652, 0.30769231,
0.1627907 ])
X.shape
(385186, 223)
np.isnan(X).all(axis=0).any()
False
X_imputed = imp.transform(X)
/home/user/anaconda/lib/python2.7/site-packages/sklearn/preprocessing/imputation.py:347:
UserWarning: Deleting features without observed values: [ 72 73 131
132]
"observed values: %s" % missing)
X_imputed.shape
(385186, 219)
# There is also nans in other columns
np.isnan(X).any(axis=0).sum()
7
np.isnan(X).any(axis=1).sum()
107181
np.isnan(X).all(axis=0).sum()
0
np.isnan(X).all(axis=1).sum()
0