MOSS-Music-8B-Thinking · MLX 6-bit

A 6-bit MLX quantization of OpenMOSS-Team/MOSS-Music-8B-Thinking for music understanding on Apple Silicon. A ~8 GB build that stays essentially lossless.

Community conversion, not an official release. All model credit goes to the OpenMOSS Team.

Other sizes: 8-bit · 4-bit

Usage

MOSS-Music is a custom multimodal (audio + text) model, so it does not load with mlx_lm / mlx_vlm directly. Use the moss_music_mlx backend (code, PR):

from huggingface_hub import snapshot_download
from moss_music_mlx import load_pretrained, generate
from src.processing_moss_music import MossMusicProcessor

path = snapshot_download("mlx-community/MOSS-Music-8B-Thinking-6bit")
model = load_pretrained(path)
proc = MossMusicProcessor.from_pretrained(path, trust_remote_code=True, enable_time_marker=True)
print(generate(model, proc, "Analyze this track: genre, key, BPM, structure.", audio_path="song.mp3"))

Conversion

6-bit, group size 64. The audio encoder is kept at bf16 to preserve audio fidelity; quantization is applied to the Qwen3 layers, token embeddings and lm_head.
Converted with mlx==0.31.2, mlx-lm==0.29.1.

Accuracy

Versus the fp32 PyTorch reference, the 6-bit model's prefill next-token argmax is identical and the logits match to cosine 0.99989 (vs 0.99999 for 8-bit), effectively lossless.

License & credit

Apache-2.0, inherited from the base model. This repository provides only the MLX-quantized weights; all credit goes to the OpenMOSS Team.

Downloads last month: -

Safetensors

Model size

2B params

Tensor type

BF16

U32

MLX

Hardware compatibility

Quantized

Inference Providers NEW

Audio-Text-to-Text

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for mlx-community/MOSS-Music-8B-Thinking-6bit

Base model

OpenMOSS-Team/MOSS-Music-8B-Thinking

Finetuned

(3)

this model