Topic modeling LaTeX equations on the arXiv

by Jaan Altosaar

Exposing scientists to alternate mathematical descriptions of problems they are working on has the potential to accelerate research. This necessitates incorporating mathematics into current topic modeling approaches such as Latent Dirichlet Allocation. By applying this approach to the arXiv's corpus of LaTeX equations, we aim to develop tools to analyze and predict historical trends of mathematical formulas in science and enhance scientific recommendation systems.