Infinity-Parser2-Pro Int8

This repository contains a MLX affine int8 quantization of infly/Infinity-Parser2-Pro prepared for native mere.run Q35 OCR evaluation.

Changes From The Base Model

  • Quantized eligible language-model linear weights to int8.
  • Used group size 64 with MLX affine quantization metadata.
  • Split Infinity-Parser2-Pro fused expert tensors into the native switch_mlp layout expected by mere.run.
  • Preserved vision tower and tokenizer sidecar files required by the native Qwen-family OCR runtime.
  • Added mererun_model.json metadata for managed local installation.

Intended Use

Use this model as the quality-focused native Infinity-Parser2 Pro OCR option in mere.run:

mere.run model pull vision-ocr-infinity-pro-int8
mere.run vision ocr ./page.png \
  --backend infinity \
  --infinity-model vision-ocr-infinity-pro-int8 \
  --infinity-task doc2md \
  --temperature 0

mere.run keeps LightOnOCR as the default OCR backend because it is smaller and more predictable across the local smoke set. This quantized Pro model is for document types where Pro's layout and parsing quality justify higher latency and memory use.

Local Evaluation Notes

On local samples, this int8 model improved over Infinity-Parser2-Flash on some metadata-heavy article layouts, while LightOnOCR remained stronger on the tested default-OCR mix. Treat this as an eval target rather than a universal default.

License

The base model is licensed under Apache-2.0. This quantized derivative is distributed under Apache-2.0 as well. See LICENSE and NOTICE.

Downloads last month
-
Safetensors
Model size
10B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Sawfwair/Infinity-Parser2-Pro-Int8

Quantized
(3)
this model