RolmOCR by Reducto AI

Earlier this year, the Allen Institute for AI released olmOCR, an open-source tool that performs document OCR using the Qwen2-VL-7B vision language model (VLM). We were excited to see a high-quality, openly available approach to parsing PDFs and other complex documents — and curious to explore what else might be possible using newer foundation models and some lightweight optimizations.

The result is RolmOCR, a drop-in alternative to olmOCR that’s faster, uses less memory, and still performs well on a variety of document types. We're releasing it under Apache 2.0 for anyone to try out, explore, or build on.

This model is a fine-tuned version of Qwen/Qwen2.5-VL-7B-Instruct on the full allenai/olmOCR-mix-0225 dataset.

Key changes

We made three notable changes: 

  1. New Base Model: We swapped in a more recent version of the existing model (Qwen2.5-VL-7B) as the foundation.

  2. No Metadata inputs: Unlike the original, we don’t use metadata extracted from PDFs. This significantly reduces prompt length, which in turn lowers both processing time and VRAM usage — without hurting accuracy in most cases. 

  3. Rotation of training data: About 15% of the training data was rotated to enhance robustness to off-angle documents. We otherwise use the same training set.

 

Usage

Host your model with vLLM:

export VLLM_USE_V1=1
vllm serve reducto/RolmOCR 

Call the model via openai compatible server:

# HOST YOUR OPENAI COMPATIBLE API WITH THE FOLLOWING COMMAND in VLLM:
# export VLLM_USE_V1=1
# vllm serve reducto/RolmOCR 

from openai import OpenAI
import base64

client = OpenAI(api_key="123", base_url="http://localhost:8000/v1")

model = "reducto/RolmOCR-7b"

def encode_image(image_path):
    with open(image_path, "rb") as image_file:
        return base64.b64encode(image_file.read()).decode("utf-8")

def ocr_page_with_rolm(img_base64):
    response = client.chat.completions.create(
        model=model,
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "image_url",
                        "image_url": {"url": f"data:image/png;base64,{img_base64}"},
                    },
                    {
                        "type": "text",
                        "text": "Return the plain text representation of this document as if you were reading it naturally.\n",
                    },
                ],
            }
        ],
        temperature=0.2,
        max_tokens=4096
    )
    return response.choices[0].message.content

test_img_path = "path/to/image.png"
img_base64 = encode_image(test_img_path)
print(ocr_page_with_rolm(img_base64))

Limitations

  • RolmOCR, like other VLM-based OCR solutions, still suffer from hallucination or dropping contents.
  • Unlike the Reducto Parsing API, RolmOCR cannot output layout bounding boxes.
  • We have not evaluated the performance of any quantized versions.

BibTex and citation info

@misc{RolmOCR,
  author = {Reducto AI},
  title = {RolmOCR: A Faster, Lighter Open Source OCR Model},
  year = {2025},
}
Downloads last month
0
Safetensors
Model size
8.29B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for reducto/RolmOCR

Finetuned
(100)
this model

Dataset used to train reducto/RolmOCR