Spaces:
Running
Running
title: olmOCR Markdown Converter | |
emoji: π | |
colorFrom: yellow | |
colorTo: blue | |
sdk: gradio | |
sdk_version: 3.50.2 | |
app_file: app.py | |
python_version: 3.11 | |
license: mit | |
# olmOCR Markdown Converter | |
This Space uses the `olmOCR` model pipeline to convert PDFs (including scientific papers) into markdown `.txt` files that retain document structure, headers, and basic math formatting β ready for Calibre/Kindle or downstream parsing. | |
- β Vision + text anchor OCR pipeline (via `olmOCR`) | |
- β Extracts semantic structure via PDF TOC | |
- β Outputs clean `.txt` in markdown format | |
- β Hugging Face **Gradio Space with GPU support** | |
## Example Use | |
Upload a scientific paper in PDF and download a markdown `.txt` version with preserved headers and inline structure. | |
--- | |
Built by [@BenedictRichardLeonardi](https://huggingface.co/BenedictRichardLeonardi) using [olmOCR](https://huggingface.co/allenai/olmOCR-7B-0225-preview) | |