Ak
2012-11-17 06:25:37 UTC
Hello,
I am dumping the dataset vectorized with TfidfVectorizer, target array, and
the classifier OneVsRestClassifierSGDClassifier(loss=log, n_iter=50,
alpha=0.00001)), since I want to add it to a package. I use joblib library from
sklearn.externals to dump the vectors. The max memory used when training the
classifier is 12g, however, when the program starts dumping classifier the usage
jumps to 38g (which I assume is due to some internal copy?). I have about 32g of
RAM, so is there a better way to store the classifier instead of using
joblib.dump(compress=9)? [I tried values compress=3, 5, 7, 9, always get memory
error]. If I do not compress the vectors total to about 11g.
Thanks
I am dumping the dataset vectorized with TfidfVectorizer, target array, and
the classifier OneVsRestClassifierSGDClassifier(loss=log, n_iter=50,
alpha=0.00001)), since I want to add it to a package. I use joblib library from
sklearn.externals to dump the vectors. The max memory used when training the
classifier is 12g, however, when the program starts dumping classifier the usage
jumps to 38g (which I assume is due to some internal copy?). I have about 32g of
RAM, so is there a better way to store the classifier instead of using
joblib.dump(compress=9)? [I tried values compress=3, 5, 7, 9, always get memory
error]. If I do not compress the vectors total to about 11g.
Thanks