Rockenkamm, Christian
2016-01-26 13:21:10 UTC
Hallo,
I have question concerning the Latent Dirichlet Allocation. The results I get from using it are a bit confusing.
At first I use about 3000 documents. In the preparation with the CountVectorizrt I use the following parameters : max_df=0.95 and min_df=0.05.
For the LDA fit I use the bath learning method. For the other parameters I have tried many different values. However regardless of which configuration I used, I face one common problem. I get topics that are never used in any of the docs and said topics all show the same structure (topic-word-distribution). I even tried gensim with the same configuration as scikit, yet I still encountered this problem. I also tried lowering the number of topics in the model, but this did not lead to the expected results either. For 100 topics, 20-27 were still affected by this problem, for 50 topics, there were still 2-8 of them being affected, depending on the parameter setting.
Does anybody have an idea as to what might be causing this problem and how to resolve it?
Best regards,
Christian Rockenkamm
I have question concerning the Latent Dirichlet Allocation. The results I get from using it are a bit confusing.
At first I use about 3000 documents. In the preparation with the CountVectorizrt I use the following parameters : max_df=0.95 and min_df=0.05.
For the LDA fit I use the bath learning method. For the other parameters I have tried many different values. However regardless of which configuration I used, I face one common problem. I get topics that are never used in any of the docs and said topics all show the same structure (topic-word-distribution). I even tried gensim with the same configuration as scikit, yet I still encountered this problem. I also tried lowering the number of topics in the model, but this did not lead to the expected results either. For 100 topics, 20-27 were still affected by this problem, for 50 topics, there were still 2-8 of them being affected, depending on the parameter setting.
Does anybody have an idea as to what might be causing this problem and how to resolve it?
Best regards,
Christian Rockenkamm