Spaces:
Running
Running
A newer version of the Gradio SDK is available:
5.29.0
metadata
title: olmOCR Markdown Converter
emoji: π
colorFrom: yellow
colorTo: blue
sdk: gradio
sdk_version: 3.50.2
app_file: app.py
python_version: 3.11
license: mit
olmOCR Markdown Converter
This Space uses the olmOCR
model pipeline to convert PDFs (including scientific papers) into markdown .txt
files that retain document structure, headers, and basic math formatting β ready for Calibre/Kindle or downstream parsing.
- β
Vision + text anchor OCR pipeline (via
olmOCR
) - β Extracts semantic structure via PDF TOC
- β
Outputs clean
.txt
in markdown format - β Hugging Face Gradio Space with GPU support
Example Use
Upload a scientific paper in PDF and download a markdown .txt
version with preserved headers and inline structure.
Built by @BenedictRichardLeonardi using olmOCR