Vanta - Local AI LLM Chat

Qwen3-VL-4B-Thinking-4bit

A verbatim mirror of mlx-community/Qwen3-VL-4B-Thinking-4bit, kept here so the Vanta iOS app always has a stable place to download it from.

Run it on your iPhone with Vanta

This is one of the built-in one-tap downloads in Vanta - Local AI LLM Chat, a local-first AI chat app for iPhone and iPad. Vanta runs models like this one fully on-device with Apple's MLX framework - no account and no cloud, your chats stay on your device. Because it's a vision-capable model, you can also chat about images.

Download Vanta on the App Store ->


This is a copy. Every model file in this repository is an exact copy of mlx-community/Qwen3-VL-4B-Thinking-4bit. We cloned it so that Vanta Client always has a reliable, always-available source to download this model from, independent of any upstream changes. All credit for the model weights and the MLX conversion goes to mlx-community, Qwen, and the original authors.


Model Details

Conversion Details

The upstream model was converted to MLX format from Qwen/Qwen3-VL-4B-Thinking using mlx-vlm version 0.3.4.

Related Models

Usage

from mlx_vlm import load, generate

model, processor = load("TerminatorPower/Qwen3-VL-4B-Thinking-4bit")

output = generate(
    model,
    processor,
    prompt="Describe this image.",
    image="path/to/image.jpg",
    max_tokens=512
)
print(output)

CLI:

python3 -m mlx_vlm.generate \
  --model TerminatorPower/Qwen3-VL-4B-Thinking-4bit \
  --image path/to/image.jpg \
  --prompt "Describe this image."

License

This model inherits the Apache 2.0 license from the original Qwen model. The mirror does not add any restrictions.

Downloads last month
5
Safetensors
Model size
1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for TerminatorPower/Qwen3-VL-4B-Thinking-4bit

Quantized
(30)
this model