Idefics3-8B-Llama3-bnb_nf4

BitsAndBytes NF4 quantization.

Quantization

Quantization created with:

from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

model_id = "HuggingFaceM4/Idefics3-8B-Llama3"

nf4_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
    bnb_4bit_use_double_quant=True,
    llm_int8_enable_fp32_cpu_offload=True,
    llm_int8_skip_modules=["lm_head", "model.vision_model", "model.connector"],
    )

model_nf4 = AutoModelForVision2Seq.from_pretrained(model_id, quantization_config=nf4_config)
Downloads last month
16
Safetensors
Model size
5.08B params
Tensor type
F32
FP16
U8
Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for leon-se/Idefics3-8B-Llama3-bnb_nf4

Quantized
(3)
this model