olmocr-demo / README.md
leonarb's picture
Update README.md
5e55b20 verified

A newer version of the Gradio SDK is available: 5.29.0

Upgrade
metadata
title: olmOCR Markdown Converter
emoji: πŸ“
colorFrom: yellow
colorTo: blue
sdk: gradio
sdk_version: 3.50.2
app_file: app.py
python_version: 3.11
license: mit

olmOCR Markdown Converter

This Space uses the olmOCR model pipeline to convert PDFs (including scientific papers) into markdown .txt files that retain document structure, headers, and basic math formatting β€” ready for Calibre/Kindle or downstream parsing.

  • βœ… Vision + text anchor OCR pipeline (via olmOCR)
  • βœ… Extracts semantic structure via PDF TOC
  • βœ… Outputs clean .txt in markdown format
  • βœ… Hugging Face Gradio Space with GPU support

Example Use

Upload a scientific paper in PDF and download a markdown .txt version with preserved headers and inline structure.


Built by @BenedictRichardLeonardi using olmOCR