VibeThinker-3B LiteRT-LM

This repository contains LiteRT-LM conversions of WeiboAI/VibeThinker-3B for local on-device inference.

VibeThinker-3B is a 3B-parameter dense reasoning model built on Qwen2.5-Coder-3B. These artifacts were exported from the Hugging Face safetensors checkpoint with LiteRT Torch and packaged as .litertlm files for the LiteRT-LM runtime.

Files

File	Context cache	Quantization	Backend target	Status
`VibeThinker-3B.litertlm`	4096	`dynamic_wi8_afp32`	CPU/GPU	Exported and template-repaired.
`VibeThinker-3B-web.litertlm`	2048	`dynamic_wi8_afp32`	CPU/GPU	Exported, template-repaired, and host CPU smoke-tested.
`chat_template.jinja`	n/a	n/a	n/a	Mobile-safe ChatML template. Replaces the source tool-call template that fails in Android LiteRT-LM template evaluation.
`conversion_manifest.json`	n/a	n/a	n/a	Toolchain versions, hashes, and conversion details.

The CPU/GPU .litertlm files include a compressed Hugging Face tokenizer, LLM metadata, a quantized prefill/decode TFLite model, and a quantized external embedder.

Run With LiteRT-LM

Install the LiteRT-LM CLI:

uv tool install litert-lm

Run the generic artifact:

litert-lm run \
  --from-huggingface-repo Tdamre/VibeThinker-3B-litert-lm \
  VibeThinker-3B.litertlm \
  --backend=cpu \
  --prompt="What is 17 * 3? Answer with just the number."

Run the lower-cache artifact:

litert-lm run \
  --from-huggingface-repo Tdamre/VibeThinker-3B-litert-lm \
  VibeThinker-3B-web.litertlm \
  --backend=cpu \
  --prompt="What is 2+2? Answer with just the number."

Conversion Summary

Source model revision:

WeiboAI/VibeThinker-3B@0c7115fdd0957b3da0f2a0829ab1763969d30300

CPU/GPU conversion command pattern:

litert-torch export_hf \
  model-cache/WeiboAI-VibeThinker-3B \
  <output_dir> \
  --keep_temporary_files=True \
  --prefill_lengths=128,1024 \
  --cache_length=<2048-or-4096> \
  --externalize_embedder=True \
  --quantization_recipe=dynamic_wi8_afp32

Toolchain:

Python 3.12.12
litert-torch 0.9.1
litert-lm 0.13.1
ai-edge-litert 2.1.5
ai-edge-quantizer 0.7.0
torch 2.12.0+cu130
transformers 5.9.0

Not Included

Qualcomm SM8750 NPU AOT artifacts were not produced in this initial pass.
MediaPipe .task bundles were not uploaded because VibeThinker-3B ships an HF tokenizer.json rather than a SentencePiece tokenizer.model.

Downloads last month: 43

Model tree for Tdamre/VibeThinker-3B-litert-lm

Base model

Qwen/Qwen2.5-3B

Finetuned

Qwen/Qwen2.5-Coder-3B

Finetuned

WeiboAI/VibeThinker-3B

Finetuned

(11)

this model