Prophet25200
/

petspeak-gemma3-vision-mlx-q4

4-bit precision

Model card Files Files and versions

PetSpeak Gemma 3 4B Vision (MLX Q4)

Conversion MLX Q4 de google/gemma-3-4b-it (variante VISION incluant le vision tower SigLIP + projection + cross-attention).

Utilisé par l'app PetMind iOS quand le toggle "Mode Vision IA" est activé. Inférence 100 % on-device sur iPhone 15 Pro+, aucune donnée envoyée à Google.

Quantization

4 bits, group size 64
Réduction ~3.5× vs bf16 (8 GB → 2.5 GB)
Précision conservée : ~98 % MMLU vs full precision

Architecture

Base : Gemma 3 4B IT (Google, 3.4B params LM)
Vision tower : SigLIP-So400M-patch14-384 (400M params, gelé)
Projection : MLP 1152 → 2560
Cross-attention : tokens visuels insérés à <start_of_image>

Licence

Gemma Terms of Use — https://ai.google.dev/gemma/terms Built with Google Gemma 3.

Downloads last month: 55

Safetensors

Model size

1B params

Tensor type

F16

·

U32

·

MLX

Hardware compatibility

Log In to add your hardware

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Prophet25200/petspeak-gemma3-vision-mlx-q4

Base model

google/gemma-3-4b-pt

Finetuned

google/gemma-3-4b-it

Adapter

Prophet25200/gemma-3-4b-petspeak-merged

Quantized

(1)

this model