Vintern-1B-v3.5 — GGUF

Quantized by @dekthedev.

All quantized GGUF variants of 5CD-AI/Vintern-1B-v3_5. Quantized with llama.cpp for local and edge deployment.

Files

Variant File Size Quality Notes
Q4_K_M vintern-1b-v3_5-q4_k_m.gguf 491 MB Very good 4-bit medium — recommended
Q5_K_M vintern-1b-v3_5-q5_k_m.gguf 522 MB Excellent 5-bit, sharper outputs
Q8_0 vintern-1b-v3_5-q8_0.gguf 675 MB Near-lossless 8-bit, closest to full precision
mmproj F16 mmproj-vintern-1b-v3_5-f16.gguf 620 MB Vision projector, required for image input

Usage

llama-cpp-python

from llama_cpp import Llama
from llama_cpp.llama_chat_format import Llava16ChatHandler

handler = Llava16ChatHandler(clip_model_path="mmproj-vintern-1b-v3_5-f16.gguf")
llm = Llama(
    model_path="vintern-1b-v3_5-q4_k_m.gguf",
    chat_handler=handler,
    n_ctx=2048,
    n_threads=4,
    verbose=False,
)
response = llm.create_chat_completion(
    messages=[{
        "role": "user",
        "content": [
            {"type": "image_url", "image_url": {"url": "data:image/jpeg;base64,<BASE64>"}},
            {"type": "text", "text": "Mô tả hình ảnh này."},
        ],
    }],
    max_tokens=256,
)
print(response["choices"][0]["message"]["content"])

llama-cli

llama-cli \
  --model vintern-1b-v3_5-q4_k_m.gguf \
  --mmproj mmproj-vintern-1b-v3_5-f16.gguf \
  --image your_image.jpg \
  --prompt "<|im_start|>user\nMô tả hình ảnh này.<|im_end|>\n<|im_start|>assistant\n" \
  --n-predict 256

Model Info

Base model 5CD-AI/Vintern-1B-v3_5
Architecture InternVL2.5-1B
Parameters 0.9B
Languages Vietnamese 🇻🇳, English, Chinese
License Apache 2.0
Downloads last month
208
GGUF
Model size
0.6B params
Architecture
qwen2
Hardware compatibility
Log In to add your hardware

4-bit

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for dekthedev/Vintern-1B-v3_5-GGUF

Quantized
(4)
this model