Configuration Parsing Warning:Config file tokenizer_config.json cannot be fetched (too big)

Mistral-Small-3.2-24B-Instruct-2506-MLX-nvfp4

MLX quantized version of Mistral Small 3.2 24B Instruct 2506.

Quantization

Method: NVFP4 (NVIDIA FP4)
Bits per weight: 4 (FP)
Details: NVIDIA's 4-bit floating-point format with block scaling, optimized for NVIDIA hardware.
Converted with: mlx-lm

Usage

pip install mlx-lm

from mlx_lm import load, generate

model, tokenizer = load("alankessler/Mistral-Small-3.2-24B-Instruct-2506-MLX-nvfp4")
prompt = tokenizer.apply_chat_template(
    [{"role": "user", "content": "Hello!"}],
    add_generation_prompt=True,
    tokenize=False,
)
response = generate(model, tokenizer, prompt=prompt, max_tokens=512)
print(response)

Base Model

Model: Mistral Small 3.2 24B Instruct 2506
Parameters: 24B
Architecture: Mistral Small 3.2
License: Apache 2.0

Downloads last month: 103

Safetensors

Model size

24B params

Tensor type

U32

BF16

MLX

Hardware compatibility

4-bit

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for alankessler/Mistral-Small-3.2-24B-Instruct-2506-MLX-nvfp4

Base model

mistralai/Mistral-Small-3.1-24B-Base-2503

Finetuned

mistralai/Mistral-Small-3.2-24B-Instruct-2506

Quantized

(59)

this model