File size: 929 Bytes
7cad21b
5e55b20
 
 
 
 
 
7cad21b
5e55b20
 
7cad21b
 
5e55b20
a6d1230
5e55b20
a6d1230
5e55b20
 
 
 
a6d1230
5e55b20
a6d1230
5e55b20
a6d1230
5e55b20
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
---
title: olmOCR Markdown Converter
emoji: πŸ“
colorFrom: yellow
colorTo: blue
sdk: gradio
sdk_version: 3.50.2
app_file: app.py
python_version: 3.11
license: mit
---

# olmOCR Markdown Converter

This Space uses the `olmOCR` model pipeline to convert PDFs (including scientific papers) into markdown `.txt` files that retain document structure, headers, and basic math formatting β€” ready for Calibre/Kindle or downstream parsing.

- βœ… Vision + text anchor OCR pipeline (via `olmOCR`)
- βœ… Extracts semantic structure via PDF TOC
- βœ… Outputs clean `.txt` in markdown format
- βœ… Hugging Face **Gradio Space with GPU support**

## Example Use

Upload a scientific paper in PDF and download a markdown `.txt` version with preserved headers and inline structure.

---

Built by [@BenedictRichardLeonardi](https://huggingface.co/BenedictRichardLeonardi) using [olmOCR](https://huggingface.co/allenai/olmOCR-7B-0225-preview)