Алексей Драль
2016-03-19 18:17:54 UTC
Hi there,
I have a data set which contains string categorical variables (like
"category_A", "category_B"). I would like to generate dummy variables from
them, but I can't use OneHotEncoder as it expects matrix of integers. I
cannot use LabelEncoder neither, because I cannot provide columns to
process. I wrote a simple class to do so that
applies DictionaryVectorizer per column and stores fitted processors. This
use case looks so common, that I expect that sklearn should contain some
functionality to do so. Could you please assist me if I miss any
standard preprocessor to generate dummy variables from strings for
specified columns?
--
Yours sincerely,
Alexey A. Dral
I have a data set which contains string categorical variables (like
"category_A", "category_B"). I would like to generate dummy variables from
them, but I can't use OneHotEncoder as it expects matrix of integers. I
cannot use LabelEncoder neither, because I cannot provide columns to
process. I wrote a simple class to do so that
applies DictionaryVectorizer per column and stores fitted processors. This
use case looks so common, that I expect that sklearn should contain some
functionality to do so. Could you please assist me if I miss any
standard preprocessor to generate dummy variables from strings for
specified columns?
--
Yours sincerely,
Alexey A. Dral