docling-parser / README.md
Ibad ur Rehman
feat: add figure transcription to docling parser
dfb4c77
metadata
title: Docling Parser API
emoji: πŸ“„
colorFrom: blue
colorTo: green
sdk: docker
app_port: 7860
pinned: false
license: mit
suggested_hardware: t4-small

Docling Parser API

A FastAPI service that turns PDFs and Excel workbooks into LLM-ready markdown using a hybrid parser:

  • Pass 1: Docling parses the document (layout + TableFormer, OCR disabled)
  • Pass 2: Gemini 3 Flash reparses table-heavy and weak-text pages
  • Pass 2.5 (opt-in): Gemini summarises qualifying charts / diagrams / figures
  • Post: Artifact removal, deduplication, table cleanup

Features

  • Docling core parser β€” layout analysis, TableFormer, cross-page understanding
  • Gemini page enhancement β€” higher-fidelity reparse for table or weak-text pages
  • Gemini figure transcription β€” optional short visual summaries for charts and diagrams (default off)
  • Excel support β€” .xlsx / .xlsm workbooks rendered as HTML <table> markdown
  • Optional image ZIP β€” return all extracted pictures as a base64 ZIP
  • URL parsing with SSRF protection β€” blocks private / loopback / cloud-metadata hosts
  • T4-friendly β€” fits comfortably in 16 GB VRAM

Architecture

PDF / Excel
  -> Pass 1: Docling (layout + TableFormer, no OCR)
  -> Pass 2: Gemini 3 Flash on table pages and weak-text pages
  -> Pass 2.5 (opt-in): Gemini describes qualifying PictureItems
  -> Post-processing: artifact removal, dedup, table cleanup
  -> Final markdown response

API Endpoints

Endpoint Method Description
/ GET Health check with model and Gemini status
/parse POST Parse an uploaded file (multipart/form-data)
/parse/url POST Parse a document from a URL (JSON body)
/docs GET OpenAPI documentation (Swagger UI)

Authentication

All parse endpoints require a bearer token:

Authorization: Bearer YOUR_API_TOKEN

Set API_TOKEN in Hugging Face Space secrets.

Supported file types

  • .pdf
  • .xlsx
  • .xlsm

Other types (images, Word, etc.) return 400 Unsupported file type.

Quick Start

cURL: Upload a File

curl -X POST "https://outcomelabs-docling-parser.hf.space/parse" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -F "file=@document.pdf"

cURL: With figure transcription

curl -X POST "https://outcomelabs-docling-parser.hf.space/parse" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -F "file=@document.pdf" \
  -F "transcribe_images=true"

cURL: Parse from URL

curl -X POST "https://outcomelabs-docling-parser.hf.space/parse/url" \
  -H "Authorization: Bearer YOUR_API_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/document.pdf", "transcribe_images": true}'

Python

import requests

API_URL = "https://outcomelabs-docling-parser.hf.space"
API_TOKEN = "your_api_token"
headers = {"Authorization": f"Bearer {API_TOKEN}"}

with open("document.pdf", "rb") as f:
    response = requests.post(
        f"{API_URL}/parse",
        headers=headers,
        files={"file": ("document.pdf", f, "application/pdf")},
        data={"transcribe_images": "true"},
    )

result = response.json()
if result["success"]:
    print(f"Parsed {result['pages_processed']} pages using {result['vlm_model']}")
    print(
        f"Figures detected={result['images_detected']}, "
        f"considered={result['images_considered']}, "
        f"transcribed={result['images_transcribed']}"
    )
    print(result["markdown"])
else:
    print(result["error"])

Request Parameters

/parse

Parameter Type Required Default Description
file File Yes - .pdf, .xlsx, or .xlsm
output_format string No markdown Only markdown is currently supported
images_scale float No 2.0 Accepted for compatibility
start_page int No 0 Starting page, zero-indexed (PDF only)
end_page int No null Ending page, or all pages if omitted (PDF only)
include_images bool No false Include extracted images as a base64 ZIP payload
transcribe_images bool No null Transcribe qualifying charts/diagrams inline. null = use server TRANSCRIBE_IMAGES default

/parse/url

Parameter Type Required Default Description
url string Yes - Source document URL
output_format string No markdown Only markdown is currently supported
images_scale float No 2.0 Accepted for compatibility
start_page int No 0 Starting page, zero-indexed (PDF only)
end_page int No null Ending page, or all pages if omitted (PDF only)
include_images bool No false Include extracted images as a base64 ZIP
transcribe_images bool No null Transcribe qualifying charts/diagrams inline

Response Format

{
  "success": true,
  "markdown": "# Document Title\n\nExtracted content...",
  "json_content": null,
  "images_zip": null,
  "image_count": 0,
  "error": null,
  "pages_processed": 20,
  "device_used": "cpu",
  "vlm_model": "Docling + Gemini",
  "gemini_page_count": 3,
  "gemini_pages": [2, 7, 12],
  "images_detected": 14,
  "images_considered": 6,
  "images_transcribed": 5
}
Field Type Description
success boolean Whether parsing succeeded
markdown string Extracted markdown
json_content object Reserved field, currently null
images_zip string Base64 ZIP of extracted images (when include_images=true)
image_count int Number of images in the ZIP
error string Error message when parsing fails
pages_processed int Number of pages processed
device_used string Device label returned by the service
vlm_model string Active parser label (Docling + Gemini or openpyxl)
gemini_page_count int Number of pages reparsed by Gemini in Pass 2
gemini_pages int[] Absolute page numbers reparsed by Gemini
images_detected int Total PictureItems Docling emitted
images_considered int PictureItems that passed the local size/area filter
images_transcribed int PictureItems that returned a non-SKIP Gemini description

Figure Transcription

When transcribe_images=true (or TRANSCRIBE_IMAGES=true on the server) and a GEMINI_API_KEY is configured, qualifying figures are sent to Gemini for a concise visual summary and inserted into the markdown as blockquotes:

> **Figure (page 7):** Bar chart of quarterly revenue from 2020–2024 showing an upward trend; peak around Q4 2024.

Figures that are decorative (logos, dividers, small icons) are filtered out locally by size and bbox-area thresholds; any remaining low-value images are skipped by Gemini via a [SKIP] escape token.

A per-request cap (MAX_IMAGE_TRANSCRIPTIONS, default 50) protects against documents with hundreds of charts. When exceeded, the largest figures by bbox area are kept and the rest dropped with a warning in the server logs.

Configuration

Environment Variable Description Default
API_TOKEN Required API authentication token -
MAX_FILE_SIZE_MB Maximum upload size in MB 1024
IMAGES_SCALE Image scale for extracted pictures 2.0
RENDER_DPI DPI for PDF→PNG rendering (Gemini page input) 200
GEMINI_API_KEY Gemini API key -
GEMINI_MODEL Gemini model name gemini-3-flash-preview
GEMINI_TIMEOUT Gemini request timeout in seconds 120
GEMINI_CONCURRENCY Max concurrent page-level Gemini requests 8
TRANSCRIBE_IMAGES Default for figure transcription (overridable per req) false
IMAGE_TRANSCRIPTION_MIN_PX Min pixel dimension to qualify a figure 150
IMAGE_TRANSCRIPTION_MIN_AREA_RATIO Min bbox-to-page area ratio to qualify a figure 0.02
MAX_IMAGE_TRANSCRIPTIONS Hard per-request cap on figure Gemini calls 50
GEMINI_IMAGE_CONCURRENCY Max concurrent figure-level Gemini calls 8

Logging

The service logs:

  • Unique 8-character request IDs on every line
  • File size, type, and page range
  • Pass 1 (Docling), Pass 2 (Gemini pages), Pass 2.5 (figures), post-processing timings
  • Figure counts (detected / considered / transcribed) and cap-truncation warnings
  • Final pages/sec and total processing time

Credits

Built with Docling, Gemini, FastAPI, and supporting Python tooling for document parsing.