olmocr-demo / README.md
leonarb's picture
Update README.md
5e55b20 verified
---
title: olmOCR Markdown Converter
emoji: πŸ“
colorFrom: yellow
colorTo: blue
sdk: gradio
sdk_version: 3.50.2
app_file: app.py
python_version: 3.11
license: mit
---
# olmOCR Markdown Converter
This Space uses the `olmOCR` model pipeline to convert PDFs (including scientific papers) into markdown `.txt` files that retain document structure, headers, and basic math formatting β€” ready for Calibre/Kindle or downstream parsing.
- βœ… Vision + text anchor OCR pipeline (via `olmOCR`)
- βœ… Extracts semantic structure via PDF TOC
- βœ… Outputs clean `.txt` in markdown format
- βœ… Hugging Face **Gradio Space with GPU support**
## Example Use
Upload a scientific paper in PDF and download a markdown `.txt` version with preserved headers and inline structure.
---
Built by [@BenedictRichardLeonardi](https://huggingface.co/BenedictRichardLeonardi) using [olmOCR](https://huggingface.co/allenai/olmOCR-7B-0225-preview)