geoffmunn commited on Oct 12

Commit

04777c0

verified ·

1 Parent(s): 14b36d3

Add quantized models with per-model cards, MODELFILE, CLI examples, and auto-upload

Browse files

Files changed (22) hide show

.gitattributes +9 -0
MODELFILE +25 -0
Qwen3-32B-Q2_K/README.md +0 -0
Qwen3-32B-Q3_K_M/README.md +0 -0
Qwen3-32B-Q3_K_S/README.md +0 -0
Qwen3-32B-Q4_K_M/README.md +0 -0
Qwen3-32B-Q4_K_S/README.md +0 -0
Qwen3-32B-Q5_K_M/README.md +0 -0
Qwen3-32B-Q5_K_S/README.md +0 -0
Qwen3-32B-Q6_K/README.md +0 -0
Qwen3-32B-Q8_0/README.md +0 -0
Qwen3-32B-f16_Q2_K.gguf +3 -0
Qwen3-32B-f16_Q3_K_M.gguf +3 -0
Qwen3-32B-f16_Q3_K_S.gguf +3 -0
Qwen3-32B-f16_Q4_K_M.gguf +3 -0
Qwen3-32B-f16_Q4_K_S.gguf +3 -0
Qwen3-32B-f16_Q5_K_M.gguf +3 -0
Qwen3-32B-f16_Q5_K_S.gguf +3 -0
Qwen3-32B-f16_Q6_K.gguf +3 -0
Qwen3-32B-f16_Q8_0.gguf +3 -0
README.md +71 -0
SHA256SUMS.txt +9 -0

.gitattributes CHANGED Viewed

@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text

 *.zip filter=lfs diff=lfs merge=lfs -text
 *.zst filter=lfs diff=lfs merge=lfs -text
 *tfevents* filter=lfs diff=lfs merge=lfs -text
+Qwen3-32B-f16_Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-32B-f16_Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-32B-f16_Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-32B-f16_Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-32B-f16_Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-32B-f16_Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-32B-f16_Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-32B-f16_Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
+Qwen3-32B-f16_Q8_0.gguf filter=lfs diff=lfs merge=lfs -text

MODELFILE ADDED Viewed

	@@ -0,0 +1,25 @@

+# MODELFILE for Qwen3-32B-GGUF
+# Used by LM Studio, OpenWebUI, GPT4All, etc.
+context_length: 32768
+embedding: false
+f16: cpu
+# Chat template using ChatML (used by Qwen)
+prompt_template: >-
+         <|im_start|>system
+       You are a helpful assistant.<|im_end|>
+         <|im_start|>user
+       {prompt}<|im_end|>
+         <|im_start|>assistant
+# Stop sequences help end generation cleanly
+stop: "<|im_end|>"
+stop: "<|im_start|>"
+# Default sampling
+temperature: 0.6
+top_p: 0.95
+top_k: 20
+min_p: 0.0
+repeat_penalty: 1.1

Qwen3-32B-Q2_K/README.md ADDED Viewed

Binary file (2.64 kB). View file

Qwen3-32B-Q3_K_M/README.md ADDED Viewed

Binary file (2.64 kB). View file

Qwen3-32B-Q3_K_S/README.md ADDED Viewed

Binary file (2.64 kB). View file

Qwen3-32B-Q4_K_M/README.md ADDED Viewed

Binary file (2.67 kB). View file

Qwen3-32B-Q4_K_S/README.md ADDED Viewed

Binary file (2.65 kB). View file

Qwen3-32B-Q5_K_M/README.md ADDED Viewed

Binary file (2.68 kB). View file

Qwen3-32B-Q5_K_S/README.md ADDED Viewed

Binary file (2.66 kB). View file

Qwen3-32B-Q6_K/README.md ADDED Viewed

Binary file (2.66 kB). View file

Qwen3-32B-Q8_0/README.md ADDED Viewed

Binary file (2.67 kB). View file

Qwen3-32B-f16_Q2_K.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:19175fb511c6cf0a387c2a6cc169010d607ddeb146e2ee65295b41386d0fa9f2
+size 12344651648

Qwen3-32B-f16_Q3_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:ba7174181bc00be4c3e0c5f00f02e49ac463b96240dcf8b6d2a54540c589f4b3
+size 15971777408

Qwen3-32B-f16_Q3_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:d671420412db80143b7dbca8e93fb9d8030370e8cd3a61eada9c43186c0742ff
+size 14389738368

Qwen3-32B-f16_Q4_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:24d3d5d2b1b15702177238cc96f9b0d2a64d5feaab5106fa1b7a6eb03e2ce8a7
+size 19762149248

Qwen3-32B-f16_Q4_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3f144155d9a4633ba7ec15678b2b1134c6ecdcb8b495ba733d5626b02860e035
+size 18771244928

Qwen3-32B-f16_Q5_K_M.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:9fc03e1de953be08a62762c3a4f5ab43a3bd138f9ab2498db42ac834a3a38779
+size 23214831488

Qwen3-32B-f16_Q5_K_S.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:733e599a386a8c6b46faed72c1c38563486f94442373350c988f9390c6174b1c
+size 22635493248

Qwen3-32B-f16_Q6_K.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:a28f654853d111a029da6d6c39941057f84eae31c54fd405a7953479f290145b
+size 26883306368

Qwen3-32B-f16_Q8_0.gguf ADDED Viewed

	@@ -0,0 +1,3 @@

+version https://git-lfs.github.com/spec/v1
+oid sha256:3fbd9a77581c4150f12339e15905fe44123c4b7c7458700d72319348fc724ed6
+size 34817719168

README.md ADDED Viewed

	@@ -0,0 +1,71 @@

+---
+license: apache-2.0
+tags:
+  - gguf
+  - qwen
+  - llama.cpp
+  - quantized
+  - text-generation
+  - reasoning   - agent   - multilingual
+base_model: Qwen/Qwen3-32B
+author: geoffmunn
+pipeline_tag: text-generation
+language:
+  - en
+  - zh
+  - es
+  - fr
+  - de
+  - ru
+  - ar
+  - ja
+  - ko
+  - hi
+---
+# Qwen3-32B-GGUF
+This is a **GGUF-quantized version** of the **[Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B)** language model — a **32-billion-parameter** LLM with state-of-the-art reasoning, research capabilities, and enterprise-grade performance. Converted for use with \llama.cpp\, [LM Studio](https://lmstudio.ai), [OpenWebUI](https://openwebui.com), [GPT4All](https://gpt4all.io), and more.
+💡 **Key Features of Qwen3-32B:**
+🤔 **Supports thinking mode** (<think>...</think>) for state-of-the-art math, coding, and logical reasoning.
+🔁 **Dynamically switch** via /think and /no_think in conversation for complex problem solving.
+🧰 **Agent-ready**: integrates seamlessly with tools via Qwen-Agent or MCP for enterprise workflows.
+🌍 **Fluent in 100+ languages** including Chinese, English, Arabic, Japanese, Spanish, and more.
+🏆 **State-of-the-art performance** — enterprise-grade reasoning and research capabilities.
+🧠 **Cutting-edge reasoning** for advanced research, complex mathematics, and scientific applications.
+💼 **Enterprise-ready** for professional and academic use cases requiring maximum accuracy.
+## Available Quantizations (from f16)
+| Level     | Quality       | Speed     | Size      | Recommendation |
+|----------|--------------|----------|-----------|----------------|
+| Q2_K | Minimal | ⚡ Fast | 19.5 GB | Only on severely memory-constrained systems. | | Q3_K_S | Low-Medium | ⚡ Fast | 22.2 GB | Minimal viability; avoid unless space-limited. | | Q3_K_M | Low-Medium | ⚡ Fast | 23.3 GB | Acceptable for basic interaction. | | Q4_K_S | Practical | ⚡ Fast | 27.0 GB | Good balance for mobile/embedded platforms. | | Q4_K_M | Practical | ⚡ Fast | 28.1 GB | Best overall choice for most users. | | Q5_K_S | Max Reasoning | 🐢 Medium | 31.5 GB | Slight quality gain; good for testing. | | Q5_K_M | Max Reasoning | 🐢 Medium | 32.2 GB | Best quality available. Recommended. | | Q6_K | Near-FP16 | 🐌 Slow | 36.5 GB | Diminishing returns. Only if RAM allows. | | Q8_0 | Lossless* | 🐌 Slow | 48.0 GB | Maximum fidelity. Ideal for archival. |
+> 💡 **Recommendations by Use Case**
+>
+  > - - 🏢 **Enterprise Workstations (64GB+ RAM)**: Q5_K_M or Q6_K for maximum quality
+- 🧠 **Advanced Research & Analysis**: Q6_K or Q8_0 for cutting-edge reasoning
+- 🔬 **Scientific Computing**: Q6_K for complex mathematical and scientific tasks
+- 💼 **Professional Applications**: Q5_K_M for enterprise-grade accuracy
+- 🛠️ **Development & Testing**: Test from Q4_K_M up to Q8_0 based on hardware
+- ⚠️ **Note**: Requires substantial RAM (32GB+ recommended for Q5_K_M+)
+## Usage
+Load this model using:
+- [OpenWebUI](https://openwebui.com) – self-hosted AI interface with RAG & tools
+- [LM Studio](https://lmstudio.ai) – desktop app with GPU support
+- [GPT4All](https://gpt4all.io) – private, offline AI chatbot
+- Or directly via \llama.cpp\
+Each quantized model includes its own \README.md\ and shares a common \MODELFILE\.
+## Author
+👤 Geoff Munn (@geoffmunn)
+🔗 [Hugging Face Profile](https://huggingface.co/geoffmunn)
+## Disclaimer
+This is a community conversion for local inference. Not affiliated with Alibaba Cloud or the Qwen team.

SHA256SUMS.txt ADDED Viewed

	@@ -0,0 +1,9 @@

+19175fb511c6cf0a387c2a6cc169010d607ddeb146e2ee65295b41386d0fa9f2  Qwen3-32B-f16_Q2_K.gguf
+ba7174181bc00be4c3e0c5f00f02e49ac463b96240dcf8b6d2a54540c589f4b3  Qwen3-32B-f16_Q3_K_M.gguf
+d671420412db80143b7dbca8e93fb9d8030370e8cd3a61eada9c43186c0742ff  Qwen3-32B-f16_Q3_K_S.gguf
+24d3d5d2b1b15702177238cc96f9b0d2a64d5feaab5106fa1b7a6eb03e2ce8a7  Qwen3-32B-f16_Q4_K_M.gguf
+3f144155d9a4633ba7ec15678b2b1134c6ecdcb8b495ba733d5626b02860e035  Qwen3-32B-f16_Q4_K_S.gguf
+9fc03e1de953be08a62762c3a4f5ab43a3bd138f9ab2498db42ac834a3a38779  Qwen3-32B-f16_Q5_K_M.gguf
+733e599a386a8c6b46faed72c1c38563486f94442373350c988f9390c6174b1c  Qwen3-32B-f16_Q5_K_S.gguf
+a28f654853d111a029da6d6c39941057f84eae31c54fd405a7953479f290145b  Qwen3-32B-f16_Q6_K.gguf
+3fbd9a77581c4150f12339e15905fe44123c4b7c7458700d72319348fc724ed6  Qwen3-32B-f16_Q8_0.gguf