Instructions to use NemoStation/Marlin-2B-MLX-8bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- MLX
How to use NemoStation/Marlin-2B-MLX-8bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Marlin-2B-MLX-8bit NemoStation/Marlin-2B-MLX-8bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
You need to agree to share your contact information to access this model
This repository is publicly accessible, but you have to accept the conditions to access its files and content.
Access to Marlin-2B-MLX-8bit uses the same form as the base model. Tell us who you are and what you're building so we can support your use case.
Log in or Sign Up to review the conditions and access this model content.
Marlin-2B — MLX 8-bit (Apple Silicon)
Original release: NemoStation/Marlin-2B
8-bit MLX conversion of NemoStation/Marlin-2B for fast, local, private inference on Apple Silicon. Same weights, same behavior — see the base model card for benchmarks, architecture, training, and intended use.
| Base model | NemoStation/Marlin-2B (2B video VLM — dense captioning + temporal grounding) |
| Format | MLX, 8-bit · ~2.5 GB (base BF16 ~5.1 GB) |
| Runs on | Apple Silicon (M-series) |
| License | Apache-2.0 (inherited from base) |
Use it (mlx-vlm)
pip install mlx-vlm
python -m mlx_vlm.generate \
--model NemoStation/Marlin-2B-MLX-8bit \
--video clip.mp4 --fps 2 \
--prompt "Describe the video."
Dense captioning works well via mlx-vlm's one-shot path. For temporal grounding ("From
<start>to<end>"), use a timestamp-aware serving path (SGLang-MLX) so per-frame time reaches the model.
Conversion recipe
python -m mlx_vlm.convert \
--hf-path NemoStation/Marlin-2B \
--mlx-path ./Marlin-2B-MLX-8bit \
-q --q-bits 8
Access
Gated with the same access form as the base model — request access above. Apache-2.0.
- Downloads last month
- -
8-bit