Dmitry Chichkov
2012-06-11 19:24:32 UTC
I'm pickling a random forest model (128 estimators, trained on 50k
examples) and the resulting .pkl size is on the order of 200MB.
Is that expected? The whole dataset size is only 400k...
Here's the code that reproduces it:
import sklearn.ensemble, pickle
clf = sklearn.ensemble.RandomForestClassifier(n_estimators=128)
clf.fit(X = [[i % 6, i % 7, i % 8] for i in range(50000)], y=[i % 5 > 0 for
i in range(50000)])
pickle.dump(clf, open("test.pkl", 'wb'))
Regards,
Dmitry
examples) and the resulting .pkl size is on the order of 200MB.
Is that expected? The whole dataset size is only 400k...
Here's the code that reproduces it:
import sklearn.ensemble, pickle
clf = sklearn.ensemble.RandomForestClassifier(n_estimators=128)
clf.fit(X = [[i % 6, i % 7, i % 8] for i in range(50000)], y=[i % 5 > 0 for
i in range(50000)])
pickle.dump(clf, open("test.pkl", 'wb'))
Regards,
Dmitry