Instructions to use macmacmacmac/VibeThinker-3B-litert-lm with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- LiteRT-LM
How to use macmacmacmac/VibeThinker-3B-litert-lm with LiteRT-LM:
# LiteRT-LM runs on various platforms (Android, iOS, Windows, Linux, macOS, IoT, Web/WASM) # and supports many APIs (C++, Python, Kotlin, Swift, JavaScript, Flutter). # For platform-specific integration guides, please refer to the official developer website: # https://ai.google.dev/edge/litert-lm # To try LiteRT-LM, the easiest way is to use our CLI tool. # 1. Install the LiteRT-LM CLI tool: pip install litert-lm # 2. Download and run this model locally: # See: https://ai.google.dev/edge/litert-lm/cli litert-lm run \ --from-huggingface-repo=macmacmacmac/VibeThinker-3B-litert-lm \ model.litertlm \ --prompt="Write me a poem"
- LiteRT
How to use macmacmacmac/VibeThinker-3B-litert-lm with LiteRT:
# No code snippets available yet for this library. # To use this model, check the repository files and the library's documentation. # Want to help? PRs adding snippets are welcome at: # https://github.com/huggingface/huggingface.js
- Notebooks
- Google Colab
- Kaggle
VibeThinker-3B — LiteRT-LM
LiteRT-LM (.litertlm) conversion of
WeiboAI/VibeThinker-3B (a Qwen2.5-3B–architecture
reasoning model) for on-device / edge inference via Google AI Edge LiteRT-LM.
Files
| file | size | notes |
|---|---|---|
vibethinker3b_q8_ekv8192_lora16.litertlm |
~3.4 GB | prefill+decode, int8 weights, 8192 ctx, runtime-swappable LoRA (rank 16) |
Conversion details
- Source:
WeiboAI/VibeThinker-3B(Qwen2.5-3B: 36 layers, hidden 2048, 16 heads / 2 KV groups, vocab 151936) - Tool:
litert-torch0.9.0 generative converter (examples.qwen.convert_to_tflite) - Quantization:
dynamic_int8(int8 weights / fp32 activations) - Context / KV cache: 8192 tokens (chosen for long reasoning traces)
- Signatures:
prefill_256,decode, plus LoRA-enabledprefill_256_lora_r16,decode_lora_r16 - Metadata: model type
qwen2p5, HF tokenizer, Qwen2.5 chat template embedded, stop tokens<|im_end|>(151645) /<|endoftext|>(151643)
LoRA note: the rank-16 LoRA signatures target the q/k/v/o projections and let the LiteRT-LM runtime load/swap a fine-tuned adapter at init (
EngineSettings::SetScopedLoraFile). Exporting these required fixing a grouped-query-attention out-dim bug inlitert-torch'slora.py(reported upstream: litert-torch#1066).
Usage
Run with the LiteRT-LM runtime (litert_lm_main / engine API):
litert_lm_main --backend=cpu --model_path=vibethinker3b_q8_ekv8192_lora16.litertlm
To attach a fine-tuned rank-16 LoRA adapter, convert it with litert_torch's
LoRA.from_safetensors(...).to_tflite() and load the resulting file via the runtime's scoped-LoRA API.
License
MIT, inherited from the base model WeiboAI/VibeThinker-3B.
- Downloads last month
- 7