💥 Qweling3.5-0.8B-GGUF

image

📄 Overview

Base Model constructai/QwenGLM3.5-0.8B
Parameters 0.9B

Quant types

Quant type Size
Q2_K 422 MB
Q3_K_S 435 MB
Q3_K_M 466 MB
Q3_K_L 491 MB
IQ4_XS 506 MB
Q4_K_S 505 MB
Q4_K_M 529 MB
Q5_K_S 564 MB
Q5_K_M 578 MB
Q6_K 630 MB
Q8_0 812 MB
F16 1.52 GB

🎯 Intended Use

This model is designed for step‑by‑step reasoning tasks where the answer requires logical decomposition before the final response. It is optimized for:

  • Educational applications — explaining "why" and "how" questions
  • On‑device assistants — runs on mobile, Raspberry Pi, or CPU‑only environments
  • Fast prototyping — small footprint (0.9B parameters), low latency
  • Reasoning distillation research — studying how small models learn from large ones (Ling → Qwen)

Not recommended for: multimodal tasks, non‑reasoning chat (e.g., creative writing), or production systems requiring 100% factual accuracy.


⚠️ Limitations & Intended Use

Intended Use:

  • Educational & Reasoning tasks — explaining step‑by‑step logic (math, science, common sense)

  • On‑device assistants — runs on CPU, Raspberry Pi, mobile (small footprint, fast inference)

  • Research baseline — for studying SFT‑only reasoning without RLHF/DPO

  • Distillation experiments — testing how well small models learn from large (Ling → Qwen)

Limitations:

  • Size matters — 0.9B parameters, so complex or multi‑hop reasoning may still fail

  • No multimodal — text only; images, video, audio are not supported

  • Factual accuracy — may hallucinate or give incorrect answers; always verify critical outputs

  • Domain restricted — trained on 15,000 reasoning examples (2.5 epochs); general chat or creative writing may be suboptimal

  • Training data bias — inherits biases from constructai/Ling-v2.6-Flash-Distilled-15K dataset; not safety‑filtered for harmful content

  • Hardware specific — optimised for T4/consumer GPUs; very slow on CPU without quantisation


🙏 Acknowledgements

This project would not have been possible without the open‑source community and the following resources:

  • Qwen Team (Alibaba Cloud) — for releasing the Qwen3.5-0.8B-Base model under Apache 2.0, a perfect balance of size and intelligence.

  • Unsloth AI — for making fine‑tuning on consumer hardware fast and memory‑efficient.

  • Hugging Face — for the ecosystem (transformers, datasets, PEFT, Hub) that democratises LLM training.

  • Kaggle — for providing free T4 GPU runtime to run this experiment.


📖 Citation

@misc{Qweling3.5-0.8B-GGUF,
  author = {constructai},
  title = {Qwenling3.5-0.8B: Small Reasoning Model via SFT on Ling Traces},
  year = {2026},
  publisher = {Hugging Face},
  howpublished = {https://huggingface.co/constructai/Qweling3.5-0.8B-GGUF},
}
Downloads last month
132
GGUF
Model size
0.8B params
Architecture
qwen35
Hardware compatibility
Log In to add your hardware

2-bit

3-bit

4-bit

5-bit

6-bit

8-bit

16-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for constructai/Qweling3.5-0.8B-GGUF

Quantized
(2)
this model

Collection including constructai/Qweling3.5-0.8B-GGUF