Instructions to use LiquidAI/LFM2.5-VL-1.6B-Extract with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="LiquidAI/LFM2.5-VL-1.6B-Extract") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoProcessor, AutoModelForImageTextToText processor = AutoProcessor.from_pretrained("LiquidAI/LFM2.5-VL-1.6B-Extract") model = AutoModelForImageTextToText.from_pretrained("LiquidAI/LFM2.5-VL-1.6B-Extract") messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] inputs = processor.apply_chat_template( messages, add_generation_prompt=True, tokenize=True, return_dict=True, return_tensors="pt", ).to(model.device) outputs = model.generate(**inputs, max_new_tokens=40) print(processor.decode(outputs[0][inputs["input_ids"].shape[-1]:])) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "LiquidAI/LFM2.5-VL-1.6B-Extract" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-1.6B-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/LiquidAI/LFM2.5-VL-1.6B-Extract
- SGLang
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2.5-VL-1.6B-Extract" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-1.6B-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "LiquidAI/LFM2.5-VL-1.6B-Extract" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "LiquidAI/LFM2.5-VL-1.6B-Extract", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use LiquidAI/LFM2.5-VL-1.6B-Extract with Docker Model Runner:
docker model run hf.co/LiquidAI/LFM2.5-VL-1.6B-Extract
LFM2.5-VL-1.6B-Extract
LFM2.5-VL-1.6B-Extract extracts user-defined fields from images and returns them as JSON. It is Liquid AI's first vision model in the Liquid Nanos collection—compact, task-specific models built for production workflows—and extends the Extract family alongside LFM2-1.2B-Extract for text documents.
⚙️ How it works
You specify what to extract as a YAML field list in the system prompt, and the model returns a JSON object with those fields. Structured outputs integrate cleanly with rule-based systems and downstream pipelines. Use it out of the box or fine-tune for domain-specific extraction.
- System prompt:
wood_color: The overall coloration of the wood surface
wood_texture: The tactile quality of the wood surface
wood_pattern: The partern types visible on the wood surface
User prompt:
Output:
{
"wood_color": "light tan to beige with darker brown streaks",
"wood_texture": "smooth with visible grain patterns",
"wood_pattern": "wavy, linear, irregular"
}
Our model supports the enum feature, which lets you provide a list of possible choices alongside the field description as follows, and the model will return one of the listed values as its answer.
- System prompt:
wood_color: The overall coloration of the wood surface, such as blue, red, or light tan
wood_texture: The tactile quality of the wood surface, select from smooth, rough, or grainy
wood_pattern: The partern types visible on the wood surface, e.g., straight, wavy, or curly
🌟 Use cases
- Detecting safety-critical events in images (e.g. fallen person, fire, leakage) to trigger automated safety systems.
- Collecting statistical information about objects across video frames for analytics pipelines.
- Auto-tag product images with structured attributes for Retail/E-commerce.
📄 Model details
| Property | Detail |
|---|---|
| Parameters (LM only) | 1.2B |
| Vision encoder | SigLIP2 (~400M, SigLIP-2 paper) |
| Backbone layers | hybrid conv+attention |
| Image input | Single image, dynamic resolution |
| Context | 128,000 tokens |
| Vocab size | 65,536 (text) |
| Precision | bfloat16 |
| License | LFM Open License v1.0 |
📊 Performance
We evaluated LFM2.5-VL-1.6B-Extract on a 2,000-sample benchmark of
(image, schema, JSON) triples, with reference labels generated by an
ensemble of frontier multimodal models. Predictions are scored on the
following three dimensions:
- JSON Validity — share of samples producing strict-parseable JSON
- Schema Consistency F1 Score — set-level F1 over predicted vs requested field names, macro-averaged across samples
- VLM Judge Score — match against the image directly, judged by a separate vision model (Qwen/Qwen3.5-35B-A3B)
| Model | Params | JSON Validity | F1 Score | VLM Judge Score |
|---|---|---|---|---|
| LFM2.5-VL-1.6B-Extract | 1.6B | 99.6 | 99.6 | 90.6 |
| LFM2.5-VL-1.6B | 1.6B | 91.8 | 75.8 | 66.0 |
| FastVLM-1.5B | 1.91B | 87.3 | 80.3 | 50.9 |
| SmolVLM2-2.2B-Instruct | 2.25B | 84.4 | 82.9 | 64.8 |
| Qwen3.5-2B | 2.27B | 97.9 | 97.7 | 89.7 |
| gemma-4-E2B-it | 2.3B | 97.4 | 97.1 | 84.4 |
| InternVL3_5-2B | 2.35B | 99.6 | 99.2 | 87.7 |
| (ref) Qwen3-VL-4B-Instruct | 4.44B | 99.8 | 99.7 | 92.0 |
| (ref) InternVL3_5-4B | 4.73B | 99.5 | 99.4 | 90.2 |
LFM2.5-VL-1.6B-Extract outperforms similarly-sized (~2B) open-source VLMs on this benchmark and is competitive with models 2× its size.
Reproducing these numbers: The full evaluation pipeline, which includes extraction, VLM judging, and metric aggregation, is bundled in this repository under model_eval/. Setup, configuration, and run instructions are in the folder's README.
Scope: These numbers characterize the model on the input/output form it is designed for: a single input image, a YAML field list as the schema, and a flat JSON object as the output. Performance is not expected to transfer to vastly different tasks, e.g. multi-image reasoning or free-form VQA.
🏃 How to run
You can run LFM2.5-VL-1.6B-Extract with Hugging Face transformers v5.1 or newer:
pip install transformers pillow
from transformers import AutoProcessor, AutoModelForImageTextToText
from transformers.image_utils import load_image
model_id = "LiquidAI/LFM2.5-VL-1.6B-Extract"
model = AutoModelForImageTextToText.from_pretrained(
model_id,
device_map="auto",
dtype="bfloat16",
trust_remote_code=True,
)
processor = AutoProcessor.from_pretrained(model_id, trust_remote_code=True)
image = load_image("https://huggingface.co/LiquidAI/LFM2.5-VL-1.6B-Extract/resolve/main/sample_image.png")
fields_yaml = """wood_color: The overall coloration of the wood surface
wood_texture: The tactile quality of the wood surface
wood_pattern: The pattern types visible on the wood surface"""
system_prompt = f"""Extract the following from the image:
{fields_yaml}
Respond with only a JSON object. Do not include any text outside the JSON."""
conversation = [
{"role": "system", "content": system_prompt},
{"role": "user", "content": [{"type": "image", "image": image}]},
]
inputs = processor.apply_chat_template(
conversation,
add_generation_prompt=True,
return_tensors="pt",
return_dict=True,
tokenize=True,
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=512, do_sample=False)
response = processor.batch_decode(
outputs[:, inputs["input_ids"].shape[1]:],
skip_special_tokens=True,
)[0]
print(response)
# {
# "wood_color": "light tan to beige with darker brown streaks",
# "wood_texture": "smooth with visible grain patterns",
# "wood_pattern": "wavy, linear, irregular"
# }
The model is intended for single-turn conversations. We recommend using greedy decoding (
temperature=0).
📬 Contact
- Got questions or want to connect? Join our Discord community
- If you are interested in custom solutions with edge deployment, please contact our sales team.
Citation
@article{liquidai2025lfm2,
title={LFM2 Technical Report},
author={Liquid AI},
journal={arXiv preprint arXiv:2511.23404},
year={2025}
}
- Downloads last month
- 130