VibeThinker-3B LiteRT-LM

This repository contains LiteRT-LM conversions of WeiboAI/VibeThinker-3B for local on-device inference.

VibeThinker-3B is a 3B-parameter dense reasoning model built on Qwen2.5-Coder-3B. These artifacts were exported from the Hugging Face safetensors checkpoint with LiteRT Torch and packaged as .litertlm files for the LiteRT-LM runtime.

Files

File Context cache Quantization Backend target Status
VibeThinker-3B.litertlm 4096 dynamic_wi8_afp32 CPU/GPU Exported and template-repaired.
VibeThinker-3B-web.litertlm 2048 dynamic_wi8_afp32 CPU/GPU Exported, template-repaired, and host CPU smoke-tested.
chat_template.jinja n/a n/a n/a Mobile-safe ChatML template. Replaces the source tool-call template that fails in Android LiteRT-LM template evaluation.
conversion_manifest.json n/a n/a n/a Toolchain versions, hashes, and conversion details.

The CPU/GPU .litertlm files include a compressed Hugging Face tokenizer, LLM metadata, a quantized prefill/decode TFLite model, and a quantized external embedder.

Run With LiteRT-LM

Install the LiteRT-LM CLI:

uv tool install litert-lm

Run the generic artifact:

litert-lm run \
  --from-huggingface-repo Tdamre/VibeThinker-3B-litert-lm \
  VibeThinker-3B.litertlm \
  --backend=cpu \
  --prompt="What is 17 * 3? Answer with just the number."

Run the lower-cache artifact:

litert-lm run \
  --from-huggingface-repo Tdamre/VibeThinker-3B-litert-lm \
  VibeThinker-3B-web.litertlm \
  --backend=cpu \
  --prompt="What is 2+2? Answer with just the number."

Conversion Summary

Source model revision:

WeiboAI/VibeThinker-3B@0c7115fdd0957b3da0f2a0829ab1763969d30300

CPU/GPU conversion command pattern:

litert-torch export_hf \
  model-cache/WeiboAI-VibeThinker-3B \
  <output_dir> \
  --keep_temporary_files=True \
  --prefill_lengths=128,1024 \
  --cache_length=<2048-or-4096> \
  --externalize_embedder=True \
  --quantization_recipe=dynamic_wi8_afp32

Toolchain:

Python 3.12.12
litert-torch 0.9.1
litert-lm 0.13.1
ai-edge-litert 2.1.5
ai-edge-quantizer 0.7.0
torch 2.12.0+cu130
transformers 5.9.0

Not Included

  • Qualcomm SM8750 NPU AOT artifacts were not produced in this initial pass.
  • MediaPipe .task bundles were not uploaded because VibeThinker-3B ships an HF tokenizer.json rather than a SentencePiece tokenizer.model.
Downloads last month
43
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Tdamre/VibeThinker-3B-litert-lm

Base model

Qwen/Qwen2.5-3B
Finetuned
(11)
this model