latex-ocr / article.md
Young Ho Shin
Add examples and article.md
36bccd1
|
raw
history blame
2.45 kB

What's the point of this?

LaTeX is the de-facto standard markup language for typesetting pretty equations in academic papers. It is extremely feature rich and flexible but very verbose. This makes it great for typesetting complex equations, but not very convenient for quick note-taking on the fly.

For example, here's a short equation from this page on Wikipedia about Quantum Electrodynamics and the corresponding LaTeX code:

Example

{\displaystyle {\mathcal {L}}={\bar {\psi }}(i\gamma ^{\mu }D_{\mu }-m)\psi -{\frac {1}{4}}F_{\mu \nu }F^{\mu \nu },}

This demo is a first step in solving that problem. Eventually, you'll be able to take a quick screenshot of an equation from a paper and a program built with this model will generate its corresponding LaTeX source code so that you can just copy/paste straight into your personal notes. No more endless googling obscure LaTeX syntax!

How does it work?

Because this problem involves looking at an image and generating valid LaTeX code, the model needs to understand both Computer Vision (CV) and Natural Language Processing (NLP). There are some other projects that aim to solve the same problem with some very interesting architectures that generally involve some kind of "encoder" that looks at the image and extracts and encodes the information about the equation from the image, and a "decoder" that takes that information and translates it into what is hopefully both valid and accurate LaTeX code.

Examples: ...

I chose to tackle this problem with transfer learning. The biggest reason for this is computing constraints - I don't have unlimited access to GPU hours and wanted training to be reasonably fast, on the order of a couple of hours. There are some other benefits to this approach, e.g. the architecture is already proven to be robust enough for various applications, so less time spent on trial and error.

I chose TrOCR, an OCR machine learning model trained by Microsoft on SRIOE data to produce text from receipts.

Made by Young Ho Shin

Email | Github | Linkedin