You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Access to Marlin-2B-MLX-8bit uses the same form as the base model. Tell us who you are and what you're building so we can support your use case.

Log in or Sign Up to review the conditions and access this model content.

Marlin-2B — MLX 8-bit (Apple Silicon)

Original release: NemoStation/Marlin-2B

8-bit MLX conversion of NemoStation/Marlin-2B for fast, local, private inference on Apple Silicon. Same weights, same behavior — see the base model card for benchmarks, architecture, training, and intended use.

Base model NemoStation/Marlin-2B (2B video VLM — dense captioning + temporal grounding)
Format MLX, 8-bit · ~2.5 GB (base BF16 ~5.1 GB)
Runs on Apple Silicon (M-series)
License Apache-2.0 (inherited from base)

Use it (mlx-vlm)

pip install mlx-vlm
python -m mlx_vlm.generate \
  --model NemoStation/Marlin-2B-MLX-8bit \
  --video clip.mp4 --fps 2 \
  --prompt "Describe the video."

Dense captioning works well via mlx-vlm's one-shot path. For temporal grounding ("From <start> to <end>"), use a timestamp-aware serving path (SGLang-MLX) so per-frame time reaches the model.

Conversion recipe

python -m mlx_vlm.convert \
  --hf-path NemoStation/Marlin-2B \
  --mlx-path ./Marlin-2B-MLX-8bit \
  -q --q-bits 8

Access

Gated with the same access form as the base model — request access above. Apache-2.0.

Downloads last month
-
Safetensors
Model size
0.9B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NemoStation/Marlin-2B-MLX-8bit

Finetuned
Qwen/Qwen3.5-2B
Quantized
(2)
this model