Instructions to use PrunaAI/WeiboAI-VibeThinker-3B-HQQ-4bit-smashed with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Pruna AI
How to use PrunaAI/WeiboAI-VibeThinker-3B-HQQ-4bit-smashed with Pruna AI:
from pruna import PrunaModel model = PrunaModel.from_pretrained("PrunaAI/WeiboAI-VibeThinker-3B-HQQ-4bit-smashed") - Notebooks
- Google Colab
- Kaggle
"Failed to load the weights ... AssertionError: Model architecture Qwen2ForCausalLM not supported yet."
I attempted to load this on my M1 MacBook Air, and received the error "Failed to load the weights" ... "AssertionError: Model architecture Qwen2ForCausalLM not supported yet." (and indeed, hqq/engine/hf.py does not list that architecture amongst those supported yet!?).
...
compute_dtype = torch.bfloat16
...
Failed to load the weights
Traceback (most recent call last):
File "/Users/ds/run_smashed_vibethinker.py", line 21, in
model = HQQModelForCausalLM.from_quantized("PrunaAI/WeiboAI-VibeThinker-3B-HQQ-4bit-smashed", compute_dtype=compute_dtype)
File "/Users/ds/.pyenv/versions/3.10.4/lib/python3.10/site-packages/hqq/engine/base.py", line 85, in from_quantized
cls._check_arch_support(arch_key)
File "/Users/ds/.pyenv/versions/3.10.4/lib/python3.10/site-packages/hqq/engine/base.py", line 38, in _check_arch_support
assert arch in cls._HQQ_REGISTRY, (
AssertionError: Model architecture Qwen2ForCausalLM not supported yet.
...
[ ... and then, FWIW, goes on to list two other errors occurring during exception handling ... ]