These are mmproj files for running Minimax-M3 with multi modal contexts in llama.cpp. Requires this work in progress fork: https://github.com/timkhronos/llama.cpp/tree/MSA. It will NOT load with Unsloth's PR. Furthermore the linked fork also implements Minimax Sparse Attention, so Unsloth's GGUFs will NOT load on it. You will need to download GGUFs with the Indexer tensors preserved: https://huggingface.co/avar6/minimax-m3-MSA-gguf . Both Vision and MSA are fully functional, at low ctx decoding speed is around the same as Unsloth's PR for now, but MSA starts pulling ahead at longer context sizes.
What MSA helps with then you may ask?
ELI5: the model was trained to only ever show 2048 tokens of ctx to any single indexer head. (Chosen individually for each of the 4 indexer heads for every sparse layer). This is what the model was trained on. Deviating from this will cause long context recall issues. Unsloth's PR instead shows the full context to every single one of these heads. This dilutes attention, and pushes the model out of distribution.
- Downloads last month
- 305
16-bit
Model tree for Serpen/Minimax_M3_MMPROJ_GGUF
Base model
MiniMaxAI/MiniMax-M3