Model Overview

Description

MiniMax-M3 is a multimodal model with frontier-level coding and agentic capabilities, built on a Mixture-of-Experts architecture with a 1M-token context window. The model processes text, image, video, and computer use inputs and produces text outputs, with emphasis on long-horizon coding tasks, agentic and tool-use workflows, and long-form video understanding. The NVIDIA MiniMax-M3 NVFP4 model is quantized with Model Optimizer.

This model is ready for non-commercial use.

Third-Party Community Consideration:

This model is not owned or developed by NVIDIA. This model has been developed and built to a third-party's requirements for this application and use case; see link to Non-NVIDIA MiniMax-M3 Model Card.

License/Terms of Use:

GOVERNING TERMS: Use of the checkpoints is governed by the MiniMax Community License.
Additional Information: Built with MiniMax M3.

Deployment Geography:

Global

Use Case:

Use Case: MiniMax-M3 is intended for multimodal understanding across text, image, and video; long-form video understanding (up to 30 minutes); long-horizon coding tasks (8+ hours); agentic and tool-use workflows; and design and creative tasks. The model supports two reasoning modes switchable per request: thinking mode for complex reasoning and agentic tasks, and non-thinking mode for latency-sensitive scenarios.

Release Date:

Hugging Face 06/23/2026 via https://huggingface.co/nvidia/MiniMax-M3-NVFP4

Model Architecture:

Architecture Type: Transformer
Network Architecture: Mixture-of-Experts (multimodal)
Total Parameters: 428B
Active Parameters: Approximately 23B per token (A23B)
Vision Encoder: ViT for image and video input

Input:

Input Types: Text, Image, Video
Input Formats: Text: String; Image: RGB images; Video: encoded video file
Input Parameters: One-Dimensional (1D), Two-Dimensional (2D), Three-Dimensional (3D)
Other Input Properties: Supports long-form video input up to 30 minutes.
Input Context Length (ISL): 1 million tokens

Output:

Output Types: Text
Output Format: String
Output Parameters: One-Dimensional (1D)
Other Output Properties: None

Our AI models are designed and/or optimized to run on NVIDIA GPU-accelerated systems. By leveraging NVIDIA's hardware (e.g. GPU cores) and software frameworks (e.g., CUDA libraries), the model achieves faster training and inference times compared to CPU-only solutions.

Software Integration:

Runtime Engine(s):
vLLM

Supported Hardware Microarchitecture Compatibility:

  • NVIDIA Blackwell

Preferred Operating System(s):

  • Linux

The integration of foundation and fine-tuned models into AI systems requires additional testing using use-case-specific data to ensure safe and effective deployment. Following the V-model methodology, iterative testing and validation at both unit and system levels are essential to mitigate risks, meet technical and functional requirements, and ensure compliance with safety and ethical standards before deployment.

Model Version(s):

This model is NVFP4 quantized with nvidia-modelopt v0.44.0

Training and Evaluation Datasets:

Calibration Dataset:

Links: Various datasets from NVIDIA's Nemotron Post-Training V3 Collection are used to train this model. Both the prompts and synthetically-generated responses in those datasets are used as-is. The following datasets are used:

Data Modality: Text
Data Collection Method by Dataset: Hybrid: Synthetic, Human, Automated
Labeling Method by Dataset: Hybrid: Synthetic, Human, Automated
Properties: The Nemotron Post-Training V3 Collection datasets are post-training datasets curated by NVIDIA containing multi-turn conversations across diverse topics. Total of ~2.9M samples, majority synthetic, others sourced from commercially-friendly datasets.

Training Dataset

Data Modality: Text, Image, Video
Image Training Data Size: Undisclosed
Text Training Data Size: Undisclosed
Training Data Collection: Undisclosed
Training Labeling: Undisclosed
Training Properties: Undisclosed

Evaluation Dataset:

Datasets: GPQA Diamond, AA-LCR, τ²-Telecom, MMMU-Pro, and SciCode
Data Collection Method by dataset: Hybrid, Automated, Human
Labeling Method by dataset: Hybrid, Automated, Human
Properties: We evaluated the model on reasoning, instruction-following, agentic, multimodal, and coding benchmarks: GPQA Diamond contains 448 graduate-level multiple-choice questions written by domain experts in biology, physics, and chemistry; AA-LCR (Artificial Analysis Long Context Reasoning) tests reasoning and synthesis over long-context inputs spanning multiple documents; τ²-Telecom (tau2-bench) is an agentic tool-use benchmark measuring multi-turn task completion in a telecom customer-service domain; MMMU-Pro is a massive multi-discipline multimodal understanding benchmark with challenging multiple-choice questions requiring image comprehension across diverse academic domains; SciCode evaluates scientific coding capabilities.

Inference:

Engine: vLLM

Test Hardware: NVIDIA Blackwell B200

Post Training Quantization

This model was obtained by quantizing the weights and activations of Minimax-M3 to NVFP4 data type. This optimization reduces the number of bits per parameter from 8 to 4, reducing disk size and GPU memory requirements by approximately 2x.

Usage

To serve this checkpoint with vLLM, you currently need the nightly docker image that includes MiniMax-M3 NVFP4 support from vllm-project/vllm#46380 (not yet in a stable release). Launch the nightly image and run the sample command below:

vllm serve nvidia/MiniMax-M3-NVFP4 \
  --tensor-parallel-size 8 \
  --block-size 128 \
  --tool-call-parser minimax_m3 \
  --reasoning-parser minimax_m3 \
  --enable-auto-tool-choice

Evaluation

NVFP4 Quantization Accuracy (vs. FP8 baseline):

Precision GPQA Diamond AA-LCR τ²-Telecom MMMU-Pro SciCode
FP8 92.53 76.62 92.22 71.97 49.90
NVFP4 91.92 75.60 91.89 71.01 49.70

Baseline: MiniMax-M3 in its native MXFP8 format. Benchmarked with temperature=1.0, top_p=0.95, max num tokens 65536.

Model Limitations:

The base model was trained on data that contains toxic language and societal biases originally crawled from the internet. Therefore, the model may amplify those biases and return toxic responses especially when prompted with toxic prompts. The model may generate answers that may be inaccurate, omit key information, or include irrelevant or redundant text producing socially unacceptable or undesirable text, even if the prompt itself does not include anything explicitly offensive.

Ethical Considerations

NVIDIA believes Trustworthy AI is a shared responsibility and we have established policies and practices to enable development for a wide array of AI applications. Developers should work with their internal model team to ensure this model meets requirements for the relevant industry and use case and addresses unforeseen product misuse.

Please make sure you have proper rights and permissions for all input image and video content; if image or video includes people, personal health information, or intellectual property, the image or video generated will not blur or maintain proportions of image subjects included.

Please report model quality, risk, security vulnerabilities or NVIDIA AI Concerns here.

Downloads last month
185
Safetensors
Model size
247B params
Tensor type
F8_E4M3
·
U8
·
F32
·
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nvidia/MiniMax-M3-NVFP4

Quantized
(33)
this model

Collection including nvidia/MiniMax-M3-NVFP4