anemll-WeiboAI-VibeThinker-1.5B-ctx2048
CoreML conversion of WeiboAI/VibeThinker-1.5B for Apple Neural Engine inference, converted using ANEMLL v0.3.5.
Original Model
VibeThinker-1.5B is a 1.5B parameter dense language model by WeiboAI, fine-tuned from Qwen2.5-Math-1.5B for competitive math and algorithm coding. Despite its small size, it achieves remarkable results:
| Benchmark | VibeThinker-1.5B | DeepSeek R1 (671B) |
|---|---|---|
| AIME24 | 80.3 | 79.8 |
| AIME25 | 74.4 | 70.0 |
| HMMT25 | 50.4 | 41.7 |
Best results with English prompts on competitive-style math and coding problems.
Conversion Details
| Parameter | Value |
|---|---|
| Architecture | Qwen 2.5 |
| Context Length | 2048 |
| Batch Size | 64 |
| Chunks | 4 (hybrid) |
| FFN Quantization | LUT6 (Apple Neural Engine), per-channel group size 4 |
| LM Head Quantization | LUT6 (Apple Neural Engine), per-channel group size 4 |
| Embeddings | Unquantized |
| Argmax in Model | No |
Hybrid FP32 First Chunk
This model uses a hybrid precision layout. Chunk 1 of 4 (qwen25_FFN_PF_lut6_chunk_01of04.mlmodelc) runs the first transformer layer's attention in FP32 (unquantized), while chunks 2-4 use standard LUT6 quantization.
Why: VibeThinker has unusually high-magnitude Q/K projection biases in layer 0, causing FP16 attention logit overflow (+inf) before masking. This leads to softmax saturation and catastrophic output divergence. Running first-layer attention at full precision eliminates the overflow and recovers near-exact parity with the original Hugging Face model.
Chunk 1 is significantly smaller (~15 MB vs ~310-340 MB for the other chunks) since it only contains the first-layer attention residual path.
Recommended Sampling
temperature: 0.6
top_p: 0.95
top_k: 0
do_sample: true
Quick Start
pip install coremltools transformers
# Basic chat
python chat.py --meta ./meta.yaml
# Full conversation mode with history
python chat_full.py --meta ./meta.yaml
Note: First load takes time as macOS places the model on the Neural Engine. Subsequent loads are fast.
iOS/macOS App
Try the ANEMLL Chat app on TestFlight:
- Install TestFlight
- Join beta: TestFlight Link
- Add this model via its HuggingFace URL
Links
License
MIT (ANEMLL conversion). The original model is also MIT licensed.
- Downloads last month
- 12