Discussion:
[Scikit-learn-general] Problem using GridSearch and custom Tokenizer‏
Mehdi
2016-03-29 08:49:33 UTC
Permalink
Hi,

I'm currently facing a problem to add stemming to my vectorizer.
I've posted my problem on StackOverflow http://stackoverflow.com/questions/36182502/add-stemming-support-to-countvectorizer-sklearn

My
problem is, when using my StemmedCountVectorizer with n_jobs=-1 in my
GridSearchCV it works. Otherwise it gives me this error:
AttributeError: 'module' object has no attribute 'StemmedCountVectorizer'
Joeln seems to understand problem and proposed me a solution but I didn't understand it.
Joeln said :
this seems to be an issue with pickling and unpickling scope. if you put stemming in an imported module, for instance, it'll be unpickled more reliably

How do I put a function or my class in a module ?

Appreciate your help and your hard work.
Mehdi.
Sebastian Raschka
2016-03-29 16:21:37 UTC
Permalink
Good question! It's been a while (1-2 years) ago, but I remember that I had similar issues once; I think it's a namespace / scoping issue.
In particular, based on my notes, I think that your python module needs to have the class (StemmedCountVectorizer) in its __main__ namespace when you call pickle.load. Not sure if it helps in this case, but you could try executing

"setattr(sys.modules["__main__"], "StemmedCountVectorizer", type(StemmedCountVectorizer()))

before you are unpickling via pickle.load.

Best,
Sebastian

> On Mar 29, 2016, at 4:49 AM, Mehdi <***@hotmail.fr> wrote:
>
> Hi,
>
> I'm currently facing a problem to add stemming to my vectorizer.
> I've posted my problem on StackOverflow http://stackoverflow.com/questions/36182502/add-stemming-support-to-countvectorizer-sklearn
>
> My problem is, when using my StemmedCountVectorizer with n_jobs=-1 in my GridSearchCV it works. Otherwise it gives me this error:
> AttributeError: 'module' object has no attribute 'StemmedCountVectorizer'
>
> Joeln seems to understand problem and proposed me a solution but I didn't understand it.
> Joeln said :
> this seems to be an issue with pickling and unpickling scope. if you put stemming in an imported module, for instance, it'll be unpickled more reliably
>
> How do I put a function or my class in a module ?
>
> Appreciate your help and your hard work.
> Mehdi.
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140_______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Mehdi
2016-03-29 16:36:42 UTC
Permalink
I tried this code but it doesn't work, I'm getting the same error.
But I'm not doing explictly pickle.load(something) it is in parallelization process.

Thanks to try.
Looking in more details to this pickling problem.

> From: ***@gmail.com
> Date: Tue, 29 Mar 2016 12:21:37 -0400
> To: scikit-learn-***@lists.sourceforge.net
> Subject: Re: [Scikit-learn-general] Problem using GridSearch and custom Tokenizerþ
>
> Good question! It's been a while (1-2 years) ago, but I remember that I had similar issues once; I think it's a namespace / scoping issue.
> In particular, based on my notes, I think that your python module needs to have the class (StemmedCountVectorizer) in its __main__ namespace when you call pickle.load. Not sure if it helps in this case, but you could try executing
>
> "setattr(sys.modules["__main__"], "StemmedCountVectorizer", type(StemmedCountVectorizer()))
>
> before you are unpickling via pickle.load.
>
> Best,
> Sebastian
>
> > On Mar 29, 2016, at 4:49 AM, Mehdi <***@hotmail.fr> wrote:
> >
> > Hi,
> >
> > I'm currently facing a problem to add stemming to my vectorizer.
> > I've posted my problem on StackOverflow http://stackoverflow.com/questions/36182502/add-stemming-support-to-countvectorizer-sklearn
> >
> > My problem is, when using my StemmedCountVectorizer with n_jobs=-1 in my GridSearchCV it works. Otherwise it gives me this error:
> > AttributeError: 'module' object has no attribute 'StemmedCountVectorizer'
> >
> > Joeln seems to understand problem and proposed me a solution but I didn't understand it.
> > Joeln said :
> > this seems to be an issue with pickling and unpickling scope. if you put stemming in an imported module, for instance, it'll be unpickled more reliably
> >
> > How do I put a function or my class in a module ?
> >
> > Appreciate your help and your hard work.
> > Mehdi.
> > ------------------------------------------------------------------------------
> > Transform Data into Opportunity.
> > Accelerate data analysis in your applications with
> > Intel Data Analytics Acceleration Library.
> > Click to learn more.
> > http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140_______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-***@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Andreas Mueller
2016-03-31 21:16:06 UTC
Permalink
Put it in it's own file.


On 03/29/2016 12:36 PM, Mehdi wrote:
> I tried this code but it doesn't work, I'm getting the same error.
> But I'm not doing explictly pickle.load(something) it is in
> parallelization process.
>
> Thanks to try.
> Looking in more details to this pickling problem.
>
> > From: ***@gmail.com
> > Date: Tue, 29 Mar 2016 12:21:37 -0400
> > To: scikit-learn-***@lists.sourceforge.net
> > Subject: Re: [Scikit-learn-general] Problem using GridSearch and
> custom Tokenizer‏
> >
> > Good question! It's been a while (1-2 years) ago, but I remember
> that I had similar issues once; I think it's a namespace / scoping issue.
> > In particular, based on my notes, I think that your python module
> needs to have the class (StemmedCountVectorizer) in its __main__
> namespace when you call pickle.load. Not sure if it helps in this
> case, but you could try executing
> >
> > "setattr(sys.modules["__main__"], "StemmedCountVectorizer",
> type(StemmedCountVectorizer()))
> >
> > before you are unpickling via pickle.load.
> >
> > Best,
> > Sebastian
> >
> > > On Mar 29, 2016, at 4:49 AM, Mehdi <***@hotmail.fr> wrote:
> > >
> > > Hi,
> > >
> > > I'm currently facing a problem to add stemming to my vectorizer.
> > > I've posted my problem on StackOverflow
> http://stackoverflow.com/questions/36182502/add-stemming-support-to-countvectorizer-sklearn
> > >
> > > My problem is, when using my StemmedCountVectorizer with n_jobs=-1
> in my GridSearchCV it works. Otherwise it gives me this error:
> > > AttributeError: 'module' object has no attribute
> 'StemmedCountVectorizer'
> > >
> > > Joeln seems to understand problem and proposed me a solution but I
> didn't understand it.
> > > Joeln said :
> > > this seems to be an issue with pickling and unpickling scope. if
> you put stemming in an imported module, for instance, it'll be
> unpickled more reliably
> > >
> > > How do I put a function or my class in a module ?
> > >
> > > Appreciate your help and your hard work.
> > > Mehdi.
> > >
> ------------------------------------------------------------------------------
> > > Transform Data into Opportunity.
> > > Accelerate data analysis in your applications with
> > > Intel Data Analytics Acceleration Library.
> > > Click to learn more.
> > >
> http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140_______________________________________________
> > > Scikit-learn-general mailing list
> > > Scikit-learn-***@lists.sourceforge.net
> > > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
> >
> >
> >
> ------------------------------------------------------------------------------
> > Transform Data into Opportunity.
> > Accelerate data analysis in your applications with
> > Intel Data Analytics Acceleration Library.
> > Click to learn more.
> > http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
> > _______________________________________________
> > Scikit-learn-general mailing list
> > Scikit-learn-***@lists.sourceforge.net
> > https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
>
>
> ------------------------------------------------------------------------------
> Transform Data into Opportunity.
> Accelerate data analysis in your applications with
> Intel Data Analytics Acceleration Library.
> Click to learn more.
> http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
>
>
> _______________________________________________
> Scikit-learn-general mailing list
> Scikit-learn-***@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Loading...