Spaces:
Running
Running
File size: 929 Bytes
7cad21b 5e55b20 7cad21b 5e55b20 7cad21b 5e55b20 a6d1230 5e55b20 a6d1230 5e55b20 a6d1230 5e55b20 a6d1230 5e55b20 a6d1230 5e55b20 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 |
---
title: olmOCR Markdown Converter
emoji: π
colorFrom: yellow
colorTo: blue
sdk: gradio
sdk_version: 3.50.2
app_file: app.py
python_version: 3.11
license: mit
---
# olmOCR Markdown Converter
This Space uses the `olmOCR` model pipeline to convert PDFs (including scientific papers) into markdown `.txt` files that retain document structure, headers, and basic math formatting β ready for Calibre/Kindle or downstream parsing.
- β
Vision + text anchor OCR pipeline (via `olmOCR`)
- β
Extracts semantic structure via PDF TOC
- β
Outputs clean `.txt` in markdown format
- β
Hugging Face **Gradio Space with GPU support**
## Example Use
Upload a scientific paper in PDF and download a markdown `.txt` version with preserved headers and inline structure.
---
Built by [@BenedictRichardLeonardi](https://huggingface.co/BenedictRichardLeonardi) using [olmOCR](https://huggingface.co/allenai/olmOCR-7B-0225-preview)
|