We also have similar issues. It'd be great to hear any cool solutions :-)
Post by Keith LehmanThanks Sebastian.
This is basically what we are doing too. The hard/time consuming part is
determining what attributes of each sckikit-learn object need to be saved
and how best to extract them.
- Keith
-----Original Message-----
Sent: Wednesday, March 23, 2016 4:05 PM
Subject: Re: [Scikit-learn-general] Scikit-learn standards for
serializing/saving objects
I also had some issues with Pickle in the past and have to admit that I
actually don't trust pickle files ;). Maybe, I am too paranoid, but I am
always afraid of corrupting or losing the data.
Probably not the most elegant solution, but I typically store estimator
settings and model parameters as JSON files (since they are human readable
in the worst case scenario having "reproducible research" in mind ;)).
# Model fitting and saving params to JSON
from sklearn.linear_model import LinearRegression from sklearn.datasets
import load_diabetes
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target
regr = LinearRegression()
regr.fit(X, y)
import json
json.dump(regr.get_params(), outfile)
json.dump(regr.coef_.tolist(), outfile, separators=(',', ':'),
sort_keys=True, indent=4)
json.dump(regr.intercept_, outfile)
# In a new session: load the params from the JSON files
import json
import codecs
from sklearn.linear_model import LinearRegression from sklearn.datasets
import load_diabetes import numpy as np
diabetes = load_diabetes()
X, y = diabetes.data, diabetes.target
obj_text = codecs.open('./params.json', 'r', encoding='utf-8').read()
params = json.loads(obj_text)
obj_text = codecs.open('./weights.json', 'r', encoding='utf-8').read()
weights = json.loads(obj_text)
obj_text = codecs.open('./intercept.json', 'r', encoding='utf-8').read()
intercept = json.loads(obj_text)
regr = LinearRegression()
regr.set_params(**params)
regr.intercept_, regr.coef_ = intercept, np.array(weights)
regr.predict(X[:10])
array([ 206.11706979, 68.07234761, 176.88406035, 166.91796559,
128.45984241, 106.34908972, 73.89417947, 118.85378669,
158.81033076, 213.58408893])
In any case, I know that this isn't pretty, and I would also be looking
forward to a better solution!
Best,
Sebastian Raschka
Iâm fairly new to scikit-learn, python, and machine learning. This
community has built a great set of libraries though, and is actually a
large part of the reason why my company has selected python to experiment
with ML.
As we are developing our product, however, we keep running into trouble
saving various objects. When possible, we use pickle to save the objects,
but this can cause problems in development â objects saved during a debug
session can not be loaded outside of the debugger. The reason appears to be
because even when pickling a âpickleableâ object (such as a trained
LinearRegression), pickle finds and saves more primitive objects that have
been instantiated within the debug environment. Dill and cpickle have the
same issue. My question is, does the scikit-learn community plan to add
standard load/save or dump/dumps and load/loads methods that would not
create these dependencies?
If there is a better forum for posting questions like these, please let
me know and Iâll be happy to post there instead.
Thanks!
Keith Lehman
Cell: 617-834-2863
Skype: k.lehman
----------------------------------------------------------------------
--------
Transform Data into Opportunity.
Accelerate data analysis in your applications with Intel Data
Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140______
_________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with Intel Data Analytics
Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
-----
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2016.0.7497 / Virus Database: 4545/11867 - Release Date: 03/23/16
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general