Instructions to use gabfssilva/lift-MLX-BF16 with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use gabfssilva/lift-MLX-BF16 with MLX:
# Make sure mlx-vlm is installed # pip install --upgrade mlx-vlm from mlx_vlm import load, generate from mlx_vlm.prompt_utils import apply_chat_template from mlx_vlm.utils import load_config # Load the model model, processor = load("gabfssilva/lift-MLX-BF16") config = load_config("gabfssilva/lift-MLX-BF16") # Prepare input image = ["http://images.cocodataset.org/val2017/000000039769.jpg"] prompt = "Describe this image." # Apply chat template formatted_prompt = apply_chat_template( processor, config, prompt, num_images=1 ) # Generate output output = generate(model, processor, formatted_prompt, image) print(output) - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
lift-MLX-BF16
MLX conversion of datalab-to/lift — a 9B qwen3_5
vision-language model for structured extraction (PDF / image → schema-constrained JSON).
This repo is the bf16 full-precision conversion — no quantization, the source weights re-saved in MLX format. Runs on Apple Silicon via mlx-vlm. Not affiliated with Datalab — a community conversion of their openly released weights.
Variants
| Repo | Method | ~bpw | Size | Peak RAM* | Gen* |
|---|---|---|---|---|---|
| lift-MLX-BF16 (this repo) | full bf16 | 16 | 18 GB | 19.9 GB | 31 t/s |
| lift-MLX-oQ8 | oQ | ~8.6 | 9.7 GB | 12.3 GB | 58 t/s |
| lift-MLX-oQ6 | oQ | ~6 | 7.7 GB | 9.4 GB | 73 t/s |
| lift-MLX-oQ5 | oQ | ~5 | 6.7 GB | 8.4 GB | 83 t/s |
| lift-MLX-oQ4 | oQ | ~4.6 | 5.6 GB | 7.2 GB | 100 t/s |
| lift-MLX-oQ3.5 | oQ | ~4.0 | 4.9 GB | 6.5 GB | 109 t/s |
| lift-MLX-oQ3 | oQ | ~3.5 | 4.6 GB | 6.2 GB | 119 t/s |
* Peak RAM and generation speed measured on a single-image invoice extraction on an Apple M5 Max (128 GB, 40-core GPU). Indicative, not a benchmark.
Usage
Generate (CLI)
uvx --from mlx-vlm mlx_vlm.generate \
--model gabfssilva/lift-MLX-BF16 \
--image invoice.png \
--prompt "Extract the invoice as JSON." \
--max-tokens 800
OpenAI-compatible server + structured outputs
lift is built for schema-constrained extraction. mlx_vlm.server enforces a JSON Schema at decode
time (via llguidance), so output is guaranteed valid and well-typed.
uvx --from mlx-vlm mlx_vlm.server --model gabfssilva/lift-MLX-BF16 --port 8080
import base64, json
from openai import OpenAI
client = OpenAI(base_url="http://127.0.0.1:8080/v1", api_key="local")
img = base64.b64encode(open("invoice.png", "rb").read()).decode()
schema = {
"type": "object",
"properties": {
"invoice_number": {"type": "string"},
"total": {"type": "number"},
"line_items": {"type": "array", "items": {"type": "object", "properties": {
"description": {"type": "string"}, "amount": {"type": "number"}}}},
},
"required": ["invoice_number", "total"],
}
resp = client.chat.completions.create(
model="gabfssilva/lift-MLX-BF16", # the server lists your whole HF cache — name the model explicitly
messages=[{"role": "user", "content": [
{"type": "text", "text": "Extract this invoice."},
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{img}"}},
]}],
response_format={"type": "json_schema", "json_schema": {"name": "invoice", "schema": schema}},
temperature=0.0, max_tokens=800,
)
print(json.loads(resp.choices[0].message.content))
Notes
- eos fix applied.
generation_config.jsonhere setseos_token_id: [248044, 248046]. Upstream only sets248044, but the chat turn closes with<|im_end|>=248046; without this, MLX servers readinggeneration_confignever stop and flood<|im_end|>. If you re-convert from the source, reapply this. - Quality. Upstream FP
lift(9B) scores 90.2% field / 20.9% full-document on Datalab's 225-doc benchmark. Every variant in this set extracted a simple test invoice correctly — that only rules out collapse, it does not rank them. Lower bit-widths may degrade on harder/adversarial documents; these were not re-benchmarked at scale.
License
Code Apache-2.0; weights under a modified OpenRAIL-M (free for research, personal use, and startups under $5M; not for use competitive with Datalab's API). See the base model.
- Downloads last month
- 64
Quantized
Model tree for gabfssilva/lift-MLX-BF16
Base model
datalab-to/lift