Florence-2 base-ft β€” ONNX (fp16)

ONNX export of microsoft/Florence-2-base-ft at fp16 precision. Florence-2 is Microsoft's unified vision-language model β€” a single checkpoint that handles captioning, OCR, object detection, region description, and grounded VQA via task-prompted decoding.

Converted artifact. Training credit: Microsoft Research.

What this repo contains

Florence-2 ships as four ONNX files (one per sub-model). All four are required at inference:

config.json
generation_config.json
preprocessor_config.json
tokenizer.json
tokenizer_config.json
vocab.json
merges.txt
special_tokens_map.json

vision_encoder_fp16.onnx     # DaViT image encoder
encoder_model_fp16.onnx      # text encoder (T5-style)
decoder_model_fp16.onnx      # autoregressive decoder
embed_tokens_fp16.onnx       # token embedding lookup

Total: ~520 MB. Use with optimum.onnxruntime.ORTModelForVision2Seq or load the four sessions manually.

How it was produced

optimum-cli export onnx \
    --model microsoft/Florence-2-base-ft \
    --task image-to-text \
    --dtype fp16 \
    --trust-remote-code \
    <output>

Toolchain: optimum 1.24.0, transformers 4.45.2, torch 2.4.x. --trust-remote-code is required β€” Florence-2 ships custom modeling code (modeling_florence2.py) in the source repo.

Task prompts (selected)

Task Prompt
Caption <CAPTION>
Detailed caption <DETAILED_CAPTION>
More detailed caption <MORE_DETAILED_CAPTION>
OCR <OCR>
OCR with regions <OCR_WITH_REGION>
Object detection <OD>
Dense region caption <DENSE_REGION_CAPTION>
Region proposal <REGION_PROPOSAL>
Caption to phrase grounding <CAPTION_TO_PHRASE_GROUNDING>
Referring expression segmentation <REFERRING_EXPRESSION_SEGMENTATION>

Full task-prompt list: see the upstream model card.

When to pick fp16 vs quantized

This repo (fp16): GPU inference, maximum quality. ~520 MB. Heliosoph/florence-2-base-ft-quantized-onnx: CPU / NPU / mobile, INT8 dynamic. ~270 MB, modestly degraded on text-heavy OCR.

License

MIT β€” same as upstream. LICENSE file included.

Downloads last month
13
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Heliosoph/florence-2-base-ft-fp16-onnx

Quantized
(6)
this model