Liquid AI
Try LFMDocsLEAPDiscord

LFM2.5-VL-450M-Extract

LFM2.5-VL-450M-Extract extracts user-defined fields from images and returns them as JSON. It is Liquid AI's first vision model in the Liquid Nanos collection—compact, task-specific models built for production workflows—and extends the Extract family alongside LFM2-350M-Extract for text documents.

You specify what to extract as a YAML field list in the system prompt, and the model returns a JSON object with those fields. Structured outputs integrate cleanly with rule-based systems and downstream pipelines. Use it out of the box or fine-tune for domain-specific extraction.

⚙️ How it works

You specify what to extract as a YAML field list in the system prompt, and the model returns a JSON object with those fields. Structured outputs integrate cleanly with rule-based systems and downstream pipelines. Use it out of the box or fine-tune for domain-specific extraction.

  • System prompt:
wood_color: The overall coloration of the wood surface
wood_texture: The tactile quality of the wood surface 
wood_pattern: The partern types visible on the wood surface
  • User prompt:

  • Output:

{
  "wood_color": "light to medium brown",
  "wood_texture": "smooth with visible grain",
  "wood_pattern": "parallel, irregular, wavy"
}

Our model supports the enum feature, which lets you provide a list of possible choices alongside the field description as follows, and the model will return one of the listed values as its answer.

  • System prompt:
wood_color: The overall coloration of the wood surface, such as blue, red, or light tan
wood_texture: The tactile quality of the wood surface, select from smooth, rough, or grainy
wood_pattern: The partern types visible on the wood surface, e.g., straight, wavy, or curly

🌟 Use cases

  • Detecting safety-critical events in images (e.g. fallen person, fire, leakage) to trigger automated safety systems.
  • Collecting statistical information about objects across video frames for analytics pipelines.
  • Auto-tag product images with structured attributes for Retail/E-commerce.

📄 Model details

Property Detail
Parameters (LM only) 350M
Vision encoder SigLIP2 (~100M, SigLIP-2 paper)
Backbone layers hybrid conv+attention
Image input Single image, dynamic resolution
Context 128,000 tokens
Vocab size 65,536 (text)
Precision bfloat16
License LFM Open License v1.0

📊 Performance

We evaluated LFM2.5-VL-450M-Extract on a 2,000-sample benchmark of (image, schema, JSON) triples, with reference labels generated by an ensemble of frontier multimodal models. Predictions are scored on the following three dimensions:

  • JSON Validity — share of samples producing strict-parseable JSON
  • Schema Consistency F1 Score — set-level F1 over predicted vs requested field names, macro-averaged across samples
  • VLM Judge Score — match against the image directly, judged by a separate vision model (Qwen/Qwen3.5-35B-A3B)
Model Params JSON Validity F1 Score VLM Judge Score
LFM2.5-VL-450M-Extract 0.45B 98.9 98.8 84.5
LFM2.5-VL-450M 0.45B 97.7 93.5 73.4
SmolVLM-500M-Instruct 0.51B 33.0 26.6 12.2
FastVLM-0.5B 0.76B 22.5 19.3 16.3
Qwen3.5-0.8B 0.87B 96.4 96.3 82.3
InternVL3_5-1B 1.06B 98.0 96.5 80.7
MiniCPM-V-4.6 1.30B 61.8 60.4 57.5
(ref) InternVL3_5-2B 2.35B 99.6 99.2 87.7
(ref) Qwen3.5-2B 2.27B 97.9 97.7 89.7
(ref) gemma-4-E2B-it 2.3B 97.4 97.1 84.4

LFM2-VL-450M-Extract outperforms similarly-sized (sub-1B) open-source VLMs on this benchmark and is competitive with models 4× its size.

Reproducing these numbers: The full evaluation pipeline, which includes extraction, VLM judging, and metric aggregation, is bundled in this repository under model_eval/. Setup, configuration, and run instructions are in the folder's README.

Scope: These numbers characterize the model on the input/output form it is designed for: a single input image, a YAML field list as the schema, and a flat JSON object as the output. Performance is not expected to transfer to largely different tasks, e.g. multi-image reasoning or free-form VQA.

The full evaluation pipeline, which includes extraction, LLM/VLM judging, and metric aggregation, is included in this repository under model_eval/. Usage details are in the folder's README.

🏃 How to run

You can run LFM2.5-VL-450M-Extract with Hugging Face transformers v5.1 or newer:

pip install transformers pillow
from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers.image_utils import load_image

model_id = "LiquidAI/LFM2.5-VL-450M-Extract"
model = AutoModelForImageTextToText.from_pretrained(
    model_id,
    device_map="auto",
    dtype="bfloat16",
    trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)

image = load_image("https://huggingface.co/LiquidAI/LFM2.5-VL-450M-Extract/resolve/main/sample_image.png")

fields_yaml = """wood_color: The overall coloration of the wood surface
wood_texture: The tactile quality of the wood surface
wood_pattern: The pattern types visible on the wood surface"""

system_prompt = f"""Extract the following from the image:

{fields_yaml}

Respond with only a JSON object. Do not include any text outside the JSON."""

conversation = [
    {"role": "system", "content": system_prompt},
    {"role": "user",   "content": [{"type": "image", "image": image}]},
]

inputs = processor.apply_chat_template(
    conversation,
    add_generation_prompt=True,
    return_tensors="pt",
    return_dict=True,
    tokenize=True,
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
response = processor.batch_decode(
    outputs[:, inputs["input_ids"].shape[1]:],
    skip_special_tokens=True,
)[0]
print(response)
# {
#   "wood_color": "light to medium brown",
#   "wood_texture": "smooth with visible grain",
#   "wood_pattern": "parallel, irregular, wavy"
# }

The model is intended for single-turn conversations. We recommend using greedy decoding (temperature=0).

📬 Contact

Citation

@article{liquidai2025lfm2,
 title={LFM2 Technical Report},
 author={Liquid AI},
 journal={arXiv preprint arXiv:2511.23404},
 year={2025}
}
Downloads last month
59
Safetensors
Model size
0.4B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for LiquidAI/LFM2.5-VL-450M-Extract

Finetuned
(24)
this model
Quantizations
1 model

Collection including LiquidAI/LFM2.5-VL-450M-Extract

Papers for LiquidAI/LFM2.5-VL-450M-Extract