Latent Dirichlet allocation
Unfortunately, there are two methods in machine learning with the initials LDA: latent Dirichlet allocation, which is a topic modeling method, and linear discriminant analysis, which is a classification method. They are completely unrelated, except for the fact that the initials LDA can refer to either. In certain situations, this can be confusing. The scikit-learn tool has a submodule, sklearn.lda
, which implements linear discriminant analysis. At the moment, scikit-learn does not implement latent Dirichlet allocation.
The first topic model
we will look at is latent Dirichlet allocation. The mathematical ideas behind LDA are fairly complex, and we will not go into the details here.
For those who are interested, and adventurous enough, Wikipedia provides all the equations behind these algorithms: http://en.wikipedia.org/wiki/Latent_Dirichlet_allocation.
However, we can understand the ideas behind LDA intuitively at a high level. LDA belongs to a class of models that...