Discussion:
[Scikit-learn-general] Imputer drops columns with
Fabian Böhnlein
2016-02-05 14:27:44 UTC
Permalink
Hi all,

hopefully a simple question:

Imputer on axis=0 drops four columns where the first element is nan even
though not all of the values of these columns are nan.

I can't reproduce with a minimal example. Any ideas what could go wrong?

Please see below the code I used to debug.

Thanks,
Fabian

X.shape
(385186, 223)

imp = Imputer(axis=0, verbose=5)

imp.fit(X)
Imputer(axis=0, copy=True, missing_values='NaN', strategy='mean',
verbose=5)

X[:,72]
array([nan, 0.0166205 , 0.00619835, ..., 0.00189036, 0.00788955,
0.00378583])

X[:,73]
array([nan, 0.31578947, 0.13636364, ..., 0.08695652, 0.30769231,
0.1627907 ])

X.shape
(385186, 223)

np.isnan(X).all(axis=0).any()
False

X_imputed = imp.transform(X)
/home/user/anaconda/lib/python2.7/site-packages/sklearn/preprocessing/imputation.py:347:
UserWarning: Deleting features without observed values: [ 72 73 131
132]
"observed values: %s" % missing)

X_imputed.shape
(385186, 219)

# There is also nans in other columns
np.isnan(X).any(axis=0).sum()
7

np.isnan(X).any(axis=1).sum()
107181

np.isnan(X).all(axis=0).sum()
0

np.isnan(X).all(axis=1).sum()
0
Alexandre Gramfort
2016-02-05 17:09:43 UTC
Permalink
hi,

what is the dtype of your input array?

Alex

Loading...