Ornith-1.0-397B-mlx-8bit

This is an MLX conversion of deepreinforce-ai/Ornith-1.0-397B, quantized to 8-bit for use on Apple Silicon with mlx-lm.

  • Base model: deepreinforce-ai/Ornith-1.0-397B (Qwen3.5-MoE, Qwen3_5MoeForConditionalGeneration, 397B total / MoE)
  • Format: MLX, 8-bit (affine)
  • Approx. size on disk: ~421 GB
  • Converted with: mlx-lm 0.31.2

Note — text-only. The original Ornith-1.0-397B is multimodal (vision encoder + language model). mlx-lm converts the language model only; the vision tower is not included. This build is for text generation. The tokenizer, chat template, and generation_config are included.

Requirements

This is a large MoE model. You need an Apple Silicon Mac with enough unified memory to hold the weights (roughly ~421 GB plus runtime overhead/KV cache). A 512 GB M3 Ultra runs all of these comfortably.

Usage

pip install -U mlx-lm
mlx_lm.generate --model pipenetwork/Ornith-1.0-397B-mlx-8bit \
  --prompt "Write a haiku about Apple Silicon." --max-tokens 256
from mlx_lm import load, generate

model, tokenizer = load("pipenetwork/Ornith-1.0-397B-mlx-8bit")
messages = [{"role": "user", "content": "Explain mixture-of-experts in one paragraph."}]
prompt = tokenizer.apply_chat_template(messages, add_generation_prompt=True)
print(generate(model, tokenizer, prompt=prompt, max_tokens=512, verbose=True))

License

MIT, inherited from the base model.

Downloads last month
427
Safetensors
Model size
396B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pipenetwork/Ornith-1.0-397B-mlx-8bit

Quantized
(13)
this model

Collection including pipenetwork/Ornith-1.0-397B-mlx-8bit