Discussion:
[Scikit-learn-general] Restrictions on feature names when drawing decision tree
Raphael C
2016-03-12 13:56:43 UTC
Permalink
I am attempting to draw a decision tree using:

reg = DecisionTreeRegressor(max_depth=None,min_samples_split=1)
reg.fit(X,Y)
dot_data = StringIO()
tree.export_graphviz(reg, out_file=dot_data,
feature_names=feature_names,
filled=True, rounded=True,
special_characters=True)
graph = pydot.graph_from_dot


This gives me the error message


File "/usr/lib/python2.7/dist-packages/pydot.py", line 1802, in <lambda>
lambda f=frmt, prog=self.prog : self.create(format=f, prog=prog))
File "/usr/lib/python2.7/dist-packages/pydot.py", line 2023, in create
status, stderr_output) )
pydot.InvocationException: Program terminated with status: 1. stderr
follows: Error: not well-formed (invalid token) in line 1
... <HTML>Design & Tech. 3D Design=A &le; 0.5 ...
in label of node 17
Error: not well-formed (invalid token) in line 1
... <HTML>Design & Tech. Product Design=A &le; 0.5 ...
in label of node 68


Is this because there is some restriction on the types of strings that
are supported as feature names?

Two of the feature names are:

'Design & Tech. 3D Design=A'

and

'Design & Tech. Product Design=A'

Raphael
Raphael C
2016-03-12 13:57:54 UTC
Permalink
The code snippet should have been


reg = DecisionTreeRegressor(max_depth=None,min_samples_split=1)
reg.fit(X,Y)
scores = cross_val_score(reg, X, Y)
print scores
dot_data = StringIO()
tree.export_graphviz(reg, out_file=dot_data,
feature_names=feature_names,
filled=True, rounded=True,
special_characters=True)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png()

Raphael
Post by Raphael C
reg = DecisionTreeRegressor(max_depth=None,min_samples_split=1)
reg.fit(X,Y)
dot_data = StringIO()
tree.export_graphviz(reg, out_file=dot_data,
feature_names=feature_names,
filled=True, rounded=True,
special_characters=True)
graph = pydot.graph_from_dot
This gives me the error message
File "/usr/lib/python2.7/dist-packages/pydot.py", line 1802, in <lambda>
lambda f=frmt, prog=self.prog : self.create(format=f, prog=prog))
File "/usr/lib/python2.7/dist-packages/pydot.py", line 2023, in create
status, stderr_output) )
pydot.InvocationException: Program terminated with status: 1. stderr
follows: Error: not well-formed (invalid token) in line 1
... <HTML>Design & Tech. 3D Design=A &le; 0.5 ...
in label of node 17
Error: not well-formed (invalid token) in line 1
... <HTML>Design & Tech. Product Design=A &le; 0.5 ...
in label of node 68
Is this because there is some restriction on the types of strings that
are supported as feature names?
'Design & Tech. 3D Design=A'
and
'Design & Tech. Product Design=A'
Raphael
Andreas Mueller
2016-03-13 21:00:40 UTC
Permalink
Try escaping the &.
Post by Raphael C
The code snippet should have been
reg = DecisionTreeRegressor(max_depth=None,min_samples_split=1)
reg.fit(X,Y)
scores = cross_val_score(reg, X, Y)
print scores
dot_data = StringIO()
tree.export_graphviz(reg, out_file=dot_data,
feature_names=feature_names,
filled=True, rounded=True,
special_characters=True)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png()
Raphael
Post by Raphael C
reg = DecisionTreeRegressor(max_depth=None,min_samples_split=1)
reg.fit(X,Y)
dot_data = StringIO()
tree.export_graphviz(reg, out_file=dot_data,
feature_names=feature_names,
filled=True, rounded=True,
special_characters=True)
graph = pydot.graph_from_dot
This gives me the error message
File "/usr/lib/python2.7/dist-packages/pydot.py", line 1802, in <lambda>
lambda f=frmt, prog=self.prog : self.create(format=f, prog=prog))
File "/usr/lib/python2.7/dist-packages/pydot.py", line 2023, in create
status, stderr_output) )
pydot.InvocationException: Program terminated with status: 1. stderr
follows: Error: not well-formed (invalid token) in line 1
... <HTML>Design & Tech. 3D Design=A &le; 0.5 ...
in label of node 17
Error: not well-formed (invalid token) in line 1
... <HTML>Design & Tech. Product Design=A &le; 0.5 ...
in label of node 68
Is this because there is some restriction on the types of strings that
are supported as feature names?
'Design & Tech. 3D Design=A'
and
'Design & Tech. Product Design=A'
Raphael
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Joel Nothman
2016-03-13 23:46:27 UTC
Permalink
We should probably be escaping feature names internally. It's easy to
forget that graphviz supports HTML-like markup.
Post by Andreas Mueller
Try escaping the &.
Post by Raphael C
The code snippet should have been
reg = DecisionTreeRegressor(max_depth=None,min_samples_split=1)
reg.fit(X,Y)
scores = cross_val_score(reg, X, Y)
print scores
dot_data = StringIO()
tree.export_graphviz(reg, out_file=dot_data,
feature_names=feature_names,
filled=True, rounded=True,
special_characters=True)
graph = pydot.graph_from_dot_data(dot_data.getvalue())
Image(graph.create_png()
Raphael
Post by Raphael C
reg = DecisionTreeRegressor(max_depth=None,min_samples_split=1)
reg.fit(X,Y)
dot_data = StringIO()
tree.export_graphviz(reg, out_file=dot_data,
feature_names=feature_names,
filled=True, rounded=True,
special_characters=True)
graph = pydot.graph_from_dot
This gives me the error message
File "/usr/lib/python2.7/dist-packages/pydot.py", line 1802, in
<lambda>
Post by Raphael C
Post by Raphael C
lambda f=frmt, prog=self.prog : self.create(format=f, prog=prog))
File "/usr/lib/python2.7/dist-packages/pydot.py", line 2023, in
create
Post by Raphael C
Post by Raphael C
status, stderr_output) )
pydot.InvocationException: Program terminated with status: 1. stderr
follows: Error: not well-formed (invalid token) in line 1
... <HTML>Design & Tech. 3D Design=A &le; 0.5 ...
in label of node 17
Error: not well-formed (invalid token) in line 1
... <HTML>Design & Tech. Product Design=A &le; 0.5 ...
in label of node 68
Is this because there is some restriction on the types of strings that
are supported as feature names?
'Design & Tech. 3D Design=A'
and
'Design & Tech. Product Design=A'
Raphael
------------------------------------------------------------------------------
Post by Raphael C
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Loading...