Discussion:
[Scikit-learn-general] load_svmlight_file value error
Gunjan Dewan
2016-02-12 13:04:42 UTC
Permalink
Hi all,

I am using the following dataset from kaggle (train.csv):
https://www.kaggle.com/c/lshtc/data

The dataset is in libSVM format.

However while trying to load it using load_svmlight_file, i get the
following error

File "_svmlight_format.pyx", line 72, in
sklearn.datasets._svmlight_format._load_svmlight_file
(sklearn\datasets\_svmlight_format.c:2120)

ValueError: could not convert string to float: b'Data'

I then removed the header but it is still giving me the same value error.
Can anyone please help me out with this?

I also wanted to know if there is any other way to convert the libSVM
format into 2 matrices.

Note : I just started out with sklearn and machine learning.

Thanks,
Gunjan
Mathieu Blondel
2016-02-12 13:29:03 UTC
Permalink
Hi Gunjan,

Apparently the dataset is multi-label, so you need to use the
multilabel=True option.

http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_svmlight_file.html

Mathieu
Post by Gunjan Dewan
Hi all,
https://www.kaggle.com/c/lshtc/data
The dataset is in libSVM format.
However while trying to load it using load_svmlight_file, i get the
following error
File "_svmlight_format.pyx", line 72, in
sklearn.datasets._svmlight_format._load_svmlight_file
(sklearn\datasets\_svmlight_format.c:2120)
ValueError: could not convert string to float: b'Data'
I then removed the header but it is still giving me the same value error.
Can anyone please help me out with this?
I also wanted to know if there is any other way to convert the libSVM
format into 2 matrices.
Note : I just started out with sklearn and machine learning.
Thanks,
Gunjan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Gunjan Dewan
2016-02-12 14:00:50 UTC
Permalink
Hi Mathieu,

Thanks a lot for the help.
But even after changing the multilabel option it is giving a value error :


File "_svmlight_format.pyx", line 67, in
sklearn.datasets._svmlight_format._load_svmlight_file
(sklearn\datasets\_svmlight_format.c:2055)

ValueError: could not convert string to float:



But this time, it does not show any value after the error. Its blank.
Any idea why this is happening?


Gunjan
Post by Mathieu Blondel
Hi Gunjan,
Apparently the dataset is multi-label, so you need to use the
multilabel=True option.
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_svmlight_file.html
Mathieu
Post by Gunjan Dewan
Hi all,
https://www.kaggle.com/c/lshtc/data
The dataset is in libSVM format.
However while trying to load it using load_svmlight_file, i get the
following error
File "_svmlight_format.pyx", line 72, in
sklearn.datasets._svmlight_format._load_svmlight_file
(sklearn\datasets\_svmlight_format.c:2120)
ValueError: could not convert string to float: b'Data'
I then removed the header but it is still giving me the same value error.
Can anyone please help me out with this?
I also wanted to know if there is any other way to convert the libSVM
format into 2 matrices.
Note : I just started out with sklearn and machine learning.
Thanks,
Gunjan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Mathieu Blondel
2016-02-13 00:34:43 UTC
Permalink
It seems like our svmlight reader doesn't support spaces between labels:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/_svmlight_format.pyx#L71

Could you report an issue on github?

In the mean time, you can write a small Python script that deletes the
space between labels.

Mathieu
Post by Gunjan Dewan
Hi Mathieu,
Thanks a lot for the help.
File "_svmlight_format.pyx", line 67, in
sklearn.datasets._svmlight_format._load_svmlight_file
(sklearn\datasets\_svmlight_format.c:2055)
But this time, it does not show any value after the error. Its blank.
Any idea why this is happening?
Gunjan
Post by Mathieu Blondel
Hi Gunjan,
Apparently the dataset is multi-label, so you need to use the
multilabel=True option.
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_svmlight_file.html
Mathieu
Post by Gunjan Dewan
Hi all,
https://www.kaggle.com/c/lshtc/data
The dataset is in libSVM format.
However while trying to load it using load_svmlight_file, i get the
following error
File "_svmlight_format.pyx", line 72, in
sklearn.datasets._svmlight_format._load_svmlight_file
(sklearn\datasets\_svmlight_format.c:2120)
ValueError: could not convert string to float: b'Data'
I then removed the header but it is still giving me the same value error.
Can anyone please help me out with this?
I also wanted to know if there is any other way to convert the libSVM
format into 2 matrices.
Note : I just started out with sklearn and machine learning.
Thanks,
Gunjan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Gunjan Dewan
2016-02-13 04:24:25 UTC
Permalink
Ill do that.

Thanks a lot.

Gunjan
Post by Mathieu Blondel
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/_svmlight_format.pyx#L71
Could you report an issue on github?
In the mean time, you can write a small Python script that deletes the
space between labels.
Mathieu
Post by Gunjan Dewan
Hi Mathieu,
Thanks a lot for the help.
File "_svmlight_format.pyx", line 67, in
sklearn.datasets._svmlight_format._load_svmlight_file
(sklearn\datasets\_svmlight_format.c:2055)
But this time, it does not show any value after the error. Its blank.
Any idea why this is happening?
Gunjan
Post by Mathieu Blondel
Hi Gunjan,
Apparently the dataset is multi-label, so you need to use the
multilabel=True option.
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_svmlight_file.html
Mathieu
On Fri, Feb 12, 2016 at 10:04 PM, Gunjan Dewan <
Post by Gunjan Dewan
Hi all,
https://www.kaggle.com/c/lshtc/data
The dataset is in libSVM format.
However while trying to load it using load_svmlight_file, i get the
following error
File "_svmlight_format.pyx", line 72, in
sklearn.datasets._svmlight_format._load_svmlight_file
(sklearn\datasets\_svmlight_format.c:2120)
ValueError: could not convert string to float: b'Data'
I then removed the header but it is still giving me the same value error.
Can anyone please help me out with this?
I also wanted to know if there is any other way to convert the libSVM
format into 2 matrices.
Note : I just started out with sklearn and machine learning.
Thanks,
Gunjan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Manjush Vundemodalu
2016-04-12 10:24:34 UTC
Permalink
Is this issue reported already ? I am getting same error while trying to
load kaggle train.csv (same file) with load_svmlight_file

Regards,
Manjush
Post by Gunjan Dewan
Ill do that.
Thanks a lot.
Gunjan
Post by Mathieu Blondel
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/_svmlight_format.pyx#L71
Could you report an issue on github?
In the mean time, you can write a small Python script that deletes the
space between labels.
Mathieu
Post by Gunjan Dewan
Hi Mathieu,
Thanks a lot for the help.
File "_svmlight_format.pyx", line 67, in
sklearn.datasets._svmlight_format._load_svmlight_file
(sklearn\datasets\_svmlight_format.c:2055)
But this time, it does not show any value after the error. Its blank.
Any idea why this is happening?
Gunjan
Post by Mathieu Blondel
Hi Gunjan,
Apparently the dataset is multi-label, so you need to use the
multilabel=True option.
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_svmlight_file.html
Mathieu
On Fri, Feb 12, 2016 at 10:04 PM, Gunjan Dewan <
Post by Gunjan Dewan
Hi all,
https://www.kaggle.com/c/lshtc/data
The dataset is in libSVM format.
However while trying to load it using load_svmlight_file, i get the
following error
File "_svmlight_format.pyx", line 72, in
sklearn.datasets._svmlight_format._load_svmlight_file
(sklearn\datasets\_svmlight_format.c:2120)
ValueError: could not convert string to float: b'Data'
I then removed the header but it is still giving me the same value error.
Can anyone please help me out with this?
I also wanted to know if there is any other way to convert the libSVM
format into 2 matrices.
Note : I just started out with sklearn and machine learning.
Thanks,
Gunjan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Gunjan Dewan
2016-04-12 12:25:30 UTC
Permalink
Hi Manjush,

Yes, this issue has been reported.

You can use the data from the following link. It's train and test data sets
do not have spaces between commas, so I was able to load this using
svmlight.

Link :
http://research.microsoft.com/en-us/um/people/manik/downloads/XC/XMLRepository.html
Post by Manjush Vundemodalu
Is this issue reported already ? I am getting same error while trying to
load kaggle train.csv (same file) with load_svmlight_file
Regards,
Manjush
Post by Gunjan Dewan
Ill do that.
Thanks a lot.
Gunjan
Post by Mathieu Blondel
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/datasets/_svmlight_format.pyx#L71
Could you report an issue on github?
In the mean time, you can write a small Python script that deletes the
space between labels.
Mathieu
On Fri, Feb 12, 2016 at 11:00 PM, Gunjan Dewan <
Post by Gunjan Dewan
Hi Mathieu,
Thanks a lot for the help.
File "_svmlight_format.pyx", line 67, in
sklearn.datasets._svmlight_format._load_svmlight_file
(sklearn\datasets\_svmlight_format.c:2055)
But this time, it does not show any value after the error. Its blank.
Any idea why this is happening?
Gunjan
Post by Mathieu Blondel
Hi Gunjan,
Apparently the dataset is multi-label, so you need to use the
multilabel=True option.
http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_svmlight_file.html
Mathieu
On Fri, Feb 12, 2016 at 10:04 PM, Gunjan Dewan <
Post by Gunjan Dewan
Hi all,
https://www.kaggle.com/c/lshtc/data
The dataset is in libSVM format.
However while trying to load it using load_svmlight_file, i get the
following error
File "_svmlight_format.pyx", line 72, in
sklearn.datasets._svmlight_format._load_svmlight_file
(sklearn\datasets\_svmlight_format.c:2120)
ValueError: could not convert string to float: b'Data'
I then removed the header but it is still giving me the same value error.
Can anyone please help me out with this?
I also wanted to know if there is any other way to convert the libSVM
format into 2 matrices.
Note : I just started out with sklearn and machine learning.
Thanks,
Gunjan
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Site24x7 APM Insight: Get Deep Visibility into Application Performance
APM + Mobile APM + RUM: Monitor 3 App instances at just $35/Month
Monitor end-to-end web transactions and take corrective actions now
Troubleshoot faster and improve end-user experience. Signup Now!
http://pubads.g.doubleclick.net/gampad/clk?id=272487151&iu=/4140
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
------------------------------------------------------------------------------
Find and fix application performance issues faster with Applications
Manager
Applications Manager provides deep performance insights into multiple
tiers of
your business applications. It resolves application problems quickly and
reduces your MTTR. Get your free trial!
https://ad.doubleclick.net/ddm/clk/302982198;130105516;z
_______________________________________________
Scikit-learn-general mailing list
https://lists.sourceforge.net/lists/listinfo/scikit-learn-general
Loading...