Instructions to use Tdamre/VibeThinker-3B-litert-lm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT-LM
How to use Tdamre/VibeThinker-3B-litert-lm with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=Tdamre/VibeThinker-3B-litert-lm \ model.litertlm \ --prompt="Write me a poem"
- Notebooks
- Google Colab
- Kaggle
VibeThinker-3B LiteRT-LM
This repository contains LiteRT-LM conversions of WeiboAI/VibeThinker-3B for local on-device inference.
VibeThinker-3B is a 3B-parameter dense reasoning model built on
Qwen2.5-Coder-3B. These artifacts were exported from the Hugging Face
safetensors checkpoint with LiteRT Torch and packaged as .litertlm files for
the LiteRT-LM runtime.
Files
| File | Context cache | Quantization | Backend target | Status |
|---|---|---|---|---|
VibeThinker-3B.litertlm |
4096 | dynamic_wi8_afp32 |
CPU/GPU | Exported and template-repaired. |
VibeThinker-3B-web.litertlm |
2048 | dynamic_wi8_afp32 |
CPU/GPU | Exported, template-repaired, and host CPU smoke-tested. |
chat_template.jinja |
n/a | n/a | n/a | Mobile-safe ChatML template. Replaces the source tool-call template that fails in Android LiteRT-LM template evaluation. |
conversion_manifest.json |
n/a | n/a | n/a | Toolchain versions, hashes, and conversion details. |
The CPU/GPU .litertlm files include a compressed Hugging Face tokenizer, LLM
metadata, a quantized prefill/decode TFLite model, and a quantized external
embedder.
Run With LiteRT-LM
Install the LiteRT-LM CLI:
uv tool install litert-lm
Run the generic artifact:
litert-lm run \
--from-huggingface-repo Tdamre/VibeThinker-3B-litert-lm \
VibeThinker-3B.litertlm \
--backend=cpu \
--prompt="What is 17 * 3? Answer with just the number."
Run the lower-cache artifact:
litert-lm run \
--from-huggingface-repo Tdamre/VibeThinker-3B-litert-lm \
VibeThinker-3B-web.litertlm \
--backend=cpu \
--prompt="What is 2+2? Answer with just the number."
Conversion Summary
Source model revision:
WeiboAI/VibeThinker-3B@0c7115fdd0957b3da0f2a0829ab1763969d30300
CPU/GPU conversion command pattern:
litert-torch export_hf \
model-cache/WeiboAI-VibeThinker-3B \
<output_dir> \
--keep_temporary_files=True \
--prefill_lengths=128,1024 \
--cache_length=<2048-or-4096> \
--externalize_embedder=True \
--quantization_recipe=dynamic_wi8_afp32
Toolchain:
Python 3.12.12
litert-torch 0.9.1
litert-lm 0.13.1
ai-edge-litert 2.1.5
ai-edge-quantizer 0.7.0
torch 2.12.0+cu130
transformers 5.9.0
Not Included
- Qualcomm SM8750 NPU AOT artifacts were not produced in this initial pass.
- MediaPipe
.taskbundles were not uploaded because VibeThinker-3B ships an HFtokenizer.jsonrather than a SentencePiecetokenizer.model.
- Downloads last month
- 43