Configuration Parsing Warning:Config file tokenizer_config.json cannot be fetched (too big)

Mistral-Small-3.2-24B-Instruct-2506-MLX-nvfp4

MLX quantized version of Mistral Small 3.2 24B Instruct 2506.

Quantization

  • Method: NVFP4 (NVIDIA FP4)
  • Bits per weight: 4 (FP)
  • Details: NVIDIA's 4-bit floating-point format with block scaling, optimized for NVIDIA hardware.
  • Converted with: mlx-lm

Usage

pip install mlx-lm
from mlx_lm import load, generate

model, tokenizer = load("alankessler/Mistral-Small-3.2-24B-Instruct-2506-MLX-nvfp4")
prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Hello!"}],
    add_generation_prompt=True,
    tokenize=False,
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512)
print(response)

Base Model

Downloads last month
103
Safetensors
Model size
24B params
Tensor type
U8
U32
BF16
MLX
Hardware compatibility
Log In to add your hardware

4-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 馃檵 Ask for provider support

Model tree for alankessler/Mistral-Small-3.2-24B-Instruct-2506-MLX-nvfp4