Spaces:
Running on L4
title: Docling Parser API
emoji: π
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
license: mit
suggested_hardware: t4-small
Docling Parser API
A FastAPI service that turns PDFs and Excel workbooks into LLM-ready markdown using a hybrid parser:
- Pass 1: Docling parses the document (layout + TableFormer, OCR disabled)
- Pass 2: Gemini 3 Flash reparses table-heavy and weak-text pages
- Pass 2.5 (opt-in): Gemini summarises qualifying charts / diagrams / figures
- Post: Artifact removal, deduplication, table cleanup
Features
- Docling core parser β layout analysis, TableFormer, cross-page understanding
- Gemini page enhancement β higher-fidelity reparse for table or weak-text pages
- Gemini figure transcription β optional short visual summaries for charts and diagrams (default off)
- Excel support β
.xlsx/.xlsmworkbooks rendered as HTML<table>markdown - Optional image ZIP β return all extracted pictures as a base64 ZIP
- URL parsing with SSRF protection β blocks private / loopback / cloud-metadata hosts
- T4-friendly β fits comfortably in 16 GB VRAM
Architecture
PDF / Excel
-> Pass 1: Docling (layout + TableFormer, no OCR)
-> Pass 2: Gemini 3 Flash on table pages and weak-text pages
-> Pass 2.5 (opt-in): Gemini describes qualifying PictureItems
-> Post-processing: artifact removal, dedup, table cleanup
-> Final markdown response
API Endpoints
| Endpoint | Method | Description |
|---|---|---|
/ |
GET | Health check with model and Gemini status |
/parse |
POST | Parse an uploaded file (multipart/form-data) |
/parse/url |
POST | Parse a document from a URL (JSON body) |
/docs |
GET | OpenAPI documentation (Swagger UI) |
Authentication
All parse endpoints require a bearer token:
Authorization: Bearer YOUR_API_TOKEN
Set API_TOKEN in Hugging Face Space secrets.
Supported file types
.pdf.xlsx.xlsm
Other types (images, Word, etc.) return 400 Unsupported file type.
Quick Start
cURL: Upload a File
curl -X POST "https://outcomelabs-docling-parser.hf.space/parse" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-F "file=@document.pdf"
cURL: With figure transcription
curl -X POST "https://outcomelabs-docling-parser.hf.space/parse" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-F "file=@document.pdf" \
-F "transcribe_images=true"
cURL: Parse from URL
curl -X POST "https://outcomelabs-docling-parser.hf.space/parse/url" \
-H "Authorization: Bearer YOUR_API_TOKEN" \
-H "Content-Type: application/json" \
-d '{"url": "https://example.com/document.pdf", "transcribe_images": true}'
Python
import requests
API_URL = "https://outcomelabs-docling-parser.hf.space"
API_TOKEN = "your_api_token"
headers = {"Authorization": f"Bearer {API_TOKEN}"}
with open("document.pdf", "rb") as f:
response = requests.post(
f"{API_URL}/parse",
headers=headers,
files={"file": ("document.pdf", f, "application/pdf")},
data={"transcribe_images": "true"},
)
result = response.json()
if result["success"]:
print(f"Parsed {result['pages_processed']} pages using {result['vlm_model']}")
print(
f"Figures detected={result['images_detected']}, "
f"considered={result['images_considered']}, "
f"transcribed={result['images_transcribed']}"
)
print(result["markdown"])
else:
print(result["error"])
Request Parameters
/parse
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
file |
File | Yes | - | .pdf, .xlsx, or .xlsm |
output_format |
string | No | markdown |
Only markdown is currently supported |
images_scale |
float | No | 2.0 |
Accepted for compatibility |
start_page |
int | No | 0 |
Starting page, zero-indexed (PDF only) |
end_page |
int | No | null |
Ending page, or all pages if omitted (PDF only) |
include_images |
bool | No | false |
Include extracted images as a base64 ZIP payload |
transcribe_images |
bool | No | null |
Transcribe qualifying charts/diagrams inline. null = use server TRANSCRIBE_IMAGES default |
/parse/url
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
url |
string | Yes | - | Source document URL |
output_format |
string | No | markdown |
Only markdown is currently supported |
images_scale |
float | No | 2.0 |
Accepted for compatibility |
start_page |
int | No | 0 |
Starting page, zero-indexed (PDF only) |
end_page |
int | No | null |
Ending page, or all pages if omitted (PDF only) |
include_images |
bool | No | false |
Include extracted images as a base64 ZIP |
transcribe_images |
bool | No | null |
Transcribe qualifying charts/diagrams inline |
Response Format
{
"success": true,
"markdown": "# Document Title\n\nExtracted content...",
"json_content": null,
"images_zip": null,
"image_count": 0,
"error": null,
"pages_processed": 20,
"device_used": "cpu",
"vlm_model": "Docling + Gemini",
"gemini_page_count": 3,
"gemini_pages": [2, 7, 12],
"images_detected": 14,
"images_considered": 6,
"images_transcribed": 5
}
| Field | Type | Description |
|---|---|---|
success |
boolean | Whether parsing succeeded |
markdown |
string | Extracted markdown |
json_content |
object | Reserved field, currently null |
images_zip |
string | Base64 ZIP of extracted images (when include_images=true) |
image_count |
int | Number of images in the ZIP |
error |
string | Error message when parsing fails |
pages_processed |
int | Number of pages processed |
device_used |
string | Device label returned by the service |
vlm_model |
string | Active parser label (Docling + Gemini or openpyxl) |
gemini_page_count |
int | Number of pages reparsed by Gemini in Pass 2 |
gemini_pages |
int[] | Absolute page numbers reparsed by Gemini |
images_detected |
int | Total PictureItems Docling emitted |
images_considered |
int | PictureItems that passed the local size/area filter |
images_transcribed |
int | PictureItems that returned a non-SKIP Gemini description |
Figure Transcription
When transcribe_images=true (or TRANSCRIBE_IMAGES=true on the server) and a GEMINI_API_KEY is configured, qualifying figures are sent to Gemini for a concise visual summary and inserted into the markdown as blockquotes:
> **Figure (page 7):** Bar chart of quarterly revenue from 2020β2024 showing an upward trend; peak around Q4 2024.
Figures that are decorative (logos, dividers, small icons) are filtered out locally by size and bbox-area thresholds; any remaining low-value images are skipped by Gemini via a [SKIP] escape token.
A per-request cap (MAX_IMAGE_TRANSCRIPTIONS, default 50) protects against documents with hundreds of charts. When exceeded, the largest figures by bbox area are kept and the rest dropped with a warning in the server logs.
Configuration
| Environment Variable | Description | Default |
|---|---|---|
API_TOKEN |
Required API authentication token | - |
MAX_FILE_SIZE_MB |
Maximum upload size in MB | 1024 |
IMAGES_SCALE |
Image scale for extracted pictures | 2.0 |
RENDER_DPI |
DPI for PDFβPNG rendering (Gemini page input) | 200 |
GEMINI_API_KEY |
Gemini API key | - |
GEMINI_MODEL |
Gemini model name | gemini-3-flash-preview |
GEMINI_TIMEOUT |
Gemini request timeout in seconds | 120 |
GEMINI_CONCURRENCY |
Max concurrent page-level Gemini requests | 8 |
TRANSCRIBE_IMAGES |
Default for figure transcription (overridable per req) | false |
IMAGE_TRANSCRIPTION_MIN_PX |
Min pixel dimension to qualify a figure | 150 |
IMAGE_TRANSCRIPTION_MIN_AREA_RATIO |
Min bbox-to-page area ratio to qualify a figure | 0.02 |
MAX_IMAGE_TRANSCRIPTIONS |
Hard per-request cap on figure Gemini calls | 50 |
GEMINI_IMAGE_CONCURRENCY |
Max concurrent figure-level Gemini calls | 8 |
Logging
The service logs:
- Unique 8-character request IDs on every line
- File size, type, and page range
- Pass 1 (Docling), Pass 2 (Gemini pages), Pass 2.5 (figures), post-processing timings
- Figure counts (detected / considered / transcribed) and cap-truncation warnings
- Final pages/sec and total processing time
Credits
Built with Docling, Gemini, FastAPI, and supporting Python tooling for document parsing.