You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Access to Marlin-2B-MLX-8bit uses the same form as the base model. Tell us who you are and what you're building so we can support your use case.

Marlin-2B — MLX 8-bit (Apple Silicon)

Original release: NemoStation/Marlin-2B

8-bit MLX conversion of NemoStation/Marlin-2B for fast, local, private inference on Apple Silicon. Same weights, same behavior — see the base model card for benchmarks, architecture, training, and intended use.


Base model	NemoStation/Marlin-2B (2B video VLM — dense captioning + temporal grounding)
Format	MLX, 8-bit · ~2.5 GB (base BF16 ~5.1 GB)
Runs on	Apple Silicon (M-series)
License	Apache-2.0 (inherited from base)

Use it (mlx-vlm)

pip install mlx-vlm
python -m mlx_vlm.generate \
  --model NemoStation/Marlin-2B-MLX-8bit \
  --video clip.mp4 --fps 2 \
  --prompt "Describe the video."

Dense captioning works well via mlx-vlm's one-shot path. For temporal grounding ("From <start> to <end>"), use a timestamp-aware serving path (SGLang-MLX) so per-frame time reaches the model.

Conversion recipe

python -m mlx_vlm.convert \
  --hf-path NemoStation/Marlin-2B \
  --mlx-path ./Marlin-2B-MLX-8bit \
  -q --q-bits 8

Access

Gated with the same access form as the base model — request access above. Apache-2.0.

Downloads last month: -

Safetensors

Model size

0.9B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Inference Providers NEW

Video-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for NemoStation/Marlin-2B-MLX-8bit

Base model

Qwen/Qwen3.5-2B-Base

Finetuned

Qwen/Qwen3.5-2B

Finetuned

NemoStation/Marlin-2B

Quantized

(2)

this model