anemll-WeiboAI-VibeThinker-1.5B-ctx2048

CoreML conversion of WeiboAI/VibeThinker-1.5B for Apple Neural Engine inference, converted using ANEMLL v0.3.5.

Original Model

VibeThinker-1.5B is a 1.5B parameter dense language model by WeiboAI, fine-tuned from Qwen2.5-Math-1.5B for competitive math and algorithm coding. Despite its small size, it achieves remarkable results:

Benchmark VibeThinker-1.5B DeepSeek R1 (671B)
AIME24 80.3 79.8
AIME25 74.4 70.0
HMMT25 50.4 41.7

Best results with English prompts on competitive-style math and coding problems.

Conversion Details

Parameter Value
Architecture Qwen 2.5
Context Length 2048
Batch Size 64
Chunks 4 (hybrid)
FFN Quantization LUT6 (Apple Neural Engine), per-channel group size 4
LM Head Quantization LUT6 (Apple Neural Engine), per-channel group size 4
Embeddings Unquantized
Argmax in Model No

Hybrid FP32 First Chunk

This model uses a hybrid precision layout. Chunk 1 of 4 (qwen25_FFN_PF_lut6_chunk_01of04.mlmodelc) runs the first transformer layer's attention in FP32 (unquantized), while chunks 2-4 use standard LUT6 quantization.

Why: VibeThinker has unusually high-magnitude Q/K projection biases in layer 0, causing FP16 attention logit overflow (+inf) before masking. This leads to softmax saturation and catastrophic output divergence. Running first-layer attention at full precision eliminates the overflow and recovers near-exact parity with the original Hugging Face model.

Chunk 1 is significantly smaller (~15 MB vs ~310-340 MB for the other chunks) since it only contains the first-layer attention residual path.

Recommended Sampling

temperature: 0.6
top_p: 0.95
top_k: 0
do_sample: true

Quick Start

pip install coremltools transformers

# Basic chat
python chat.py --meta ./meta.yaml

# Full conversation mode with history
python chat_full.py --meta ./meta.yaml

Note: First load takes time as macOS places the model on the Neural Engine. Subsequent loads are fast.

iOS/macOS App

Try the ANEMLL Chat app on TestFlight:

  1. Install TestFlight
  2. Join beta: TestFlight Link
  3. Add this model via its HuggingFace URL

Links

License

MIT (ANEMLL conversion). The original model is also MIT licensed.

Downloads last month
12
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including anemll/anemll-WeiboAI-VibeThinker-1.5B-ctx2048_0.3.5