GGUF

MiniMax-M3-EAGLE3 Draft Models

These are quantized EAGLE3 draft models for speculative decoding with MiniMax-M3.

Note: This requires a special build of llama.cpp with MiniMax M3 + EAGLE3 support. See build instructions below.

Building llama.cpp with MiniMax M3 + EAGLE3 support

Option 1: Using the PR (recommended)

git clone https://github.com/ggml-org/llama.cpp
cd llama.cpp
git fetch origin pull/24925/head:minimax-m3-eagle3
git checkout minimax-m3-eagle3
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j --target llama-cli llama-server

Option 2: Using the fork

git clone https://github.com/nick-tonjum/llama.cpp-minimax-m3-eagle3
cd llama.cpp-minimax-m3-eagle3
cmake -B build -DGGML_CUDA=ON
cmake --build build --config Release -j --target llama-cli llama-server

Usage

llama-server \
  -m /path/to/minimax-m3-base-model.gguf \
  -md /path/to/MiniMax-M3-EAGLE3-Q4_K_M.gguf \
  --spec-type draft-eagle3 \
  --fit on --fit-target 1024 --fit-ctx 131072
Downloads last month
6
GGUF
Model size
3B params
Architecture
eagle3
Hardware compatibility
Log In to add your hardware

2-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for tonjum/MiniMax-M3-EAGLE3-GGUF

Quantized
(1)
this model