geoffmunn commited on
Commit
04777c0
Β·
verified Β·
1 Parent(s): 14b36d3

Add quantized models with per-model cards, MODELFILE, CLI examples, and auto-upload

Browse files
.gitattributes CHANGED
@@ -33,3 +33,12 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
 
 
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ Qwen3-32B-f16_Q2_K.gguf filter=lfs diff=lfs merge=lfs -text
37
+ Qwen3-32B-f16_Q3_K_M.gguf filter=lfs diff=lfs merge=lfs -text
38
+ Qwen3-32B-f16_Q3_K_S.gguf filter=lfs diff=lfs merge=lfs -text
39
+ Qwen3-32B-f16_Q4_K_M.gguf filter=lfs diff=lfs merge=lfs -text
40
+ Qwen3-32B-f16_Q4_K_S.gguf filter=lfs diff=lfs merge=lfs -text
41
+ Qwen3-32B-f16_Q5_K_M.gguf filter=lfs diff=lfs merge=lfs -text
42
+ Qwen3-32B-f16_Q5_K_S.gguf filter=lfs diff=lfs merge=lfs -text
43
+ Qwen3-32B-f16_Q6_K.gguf filter=lfs diff=lfs merge=lfs -text
44
+ Qwen3-32B-f16_Q8_0.gguf filter=lfs diff=lfs merge=lfs -text
MODELFILE ADDED
@@ -0,0 +1,25 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ # MODELFILE for Qwen3-32B-GGUF
2
+ # Used by LM Studio, OpenWebUI, GPT4All, etc.
3
+
4
+ context_length: 32768
5
+ embedding: false
6
+ f16: cpu
7
+
8
+ # Chat template using ChatML (used by Qwen)
9
+ prompt_template: >-
10
+ <|im_start|>system
11
+ You are a helpful assistant.<|im_end|>
12
+ <|im_start|>user
13
+ {prompt}<|im_end|>
14
+ <|im_start|>assistant
15
+
16
+ # Stop sequences help end generation cleanly
17
+ stop: "<|im_end|>"
18
+ stop: "<|im_start|>"
19
+
20
+ # Default sampling
21
+ temperature: 0.6
22
+ top_p: 0.95
23
+ top_k: 20
24
+ min_p: 0.0
25
+ repeat_penalty: 1.1
Qwen3-32B-Q2_K/README.md ADDED
Binary file (2.64 kB). View file
 
Qwen3-32B-Q3_K_M/README.md ADDED
Binary file (2.64 kB). View file
 
Qwen3-32B-Q3_K_S/README.md ADDED
Binary file (2.64 kB). View file
 
Qwen3-32B-Q4_K_M/README.md ADDED
Binary file (2.67 kB). View file
 
Qwen3-32B-Q4_K_S/README.md ADDED
Binary file (2.65 kB). View file
 
Qwen3-32B-Q5_K_M/README.md ADDED
Binary file (2.68 kB). View file
 
Qwen3-32B-Q5_K_S/README.md ADDED
Binary file (2.66 kB). View file
 
Qwen3-32B-Q6_K/README.md ADDED
Binary file (2.66 kB). View file
 
Qwen3-32B-Q8_0/README.md ADDED
Binary file (2.67 kB). View file
 
Qwen3-32B-f16_Q2_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:19175fb511c6cf0a387c2a6cc169010d607ddeb146e2ee65295b41386d0fa9f2
3
+ size 12344651648
Qwen3-32B-f16_Q3_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:ba7174181bc00be4c3e0c5f00f02e49ac463b96240dcf8b6d2a54540c589f4b3
3
+ size 15971777408
Qwen3-32B-f16_Q3_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:d671420412db80143b7dbca8e93fb9d8030370e8cd3a61eada9c43186c0742ff
3
+ size 14389738368
Qwen3-32B-f16_Q4_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:24d3d5d2b1b15702177238cc96f9b0d2a64d5feaab5106fa1b7a6eb03e2ce8a7
3
+ size 19762149248
Qwen3-32B-f16_Q4_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3f144155d9a4633ba7ec15678b2b1134c6ecdcb8b495ba733d5626b02860e035
3
+ size 18771244928
Qwen3-32B-f16_Q5_K_M.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:9fc03e1de953be08a62762c3a4f5ab43a3bd138f9ab2498db42ac834a3a38779
3
+ size 23214831488
Qwen3-32B-f16_Q5_K_S.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:733e599a386a8c6b46faed72c1c38563486f94442373350c988f9390c6174b1c
3
+ size 22635493248
Qwen3-32B-f16_Q6_K.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:a28f654853d111a029da6d6c39941057f84eae31c54fd405a7953479f290145b
3
+ size 26883306368
Qwen3-32B-f16_Q8_0.gguf ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:3fbd9a77581c4150f12339e15905fe44123c4b7c7458700d72319348fc724ed6
3
+ size 34817719168
README.md ADDED
@@ -0,0 +1,71 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ tags:
4
+ - gguf
5
+ - qwen
6
+ - llama.cpp
7
+ - quantized
8
+ - text-generation
9
+ - reasoning - agent - multilingual
10
+ base_model: Qwen/Qwen3-32B
11
+ author: geoffmunn
12
+ pipeline_tag: text-generation
13
+ language:
14
+ - en
15
+ - zh
16
+ - es
17
+ - fr
18
+ - de
19
+ - ru
20
+ - ar
21
+ - ja
22
+ - ko
23
+ - hi
24
+ ---
25
+
26
+ # Qwen3-32B-GGUF
27
+
28
+ This is a **GGUF-quantized version** of the **[Qwen/Qwen3-32B](https://huggingface.co/Qwen/Qwen3-32B)** language model β€” a **32-billion-parameter** LLM with state-of-the-art reasoning, research capabilities, and enterprise-grade performance. Converted for use with \llama.cpp\, [LM Studio](https://lmstudio.ai), [OpenWebUI](https://openwebui.com), [GPT4All](https://gpt4all.io), and more.
29
+
30
+ πŸ’‘ **Key Features of Qwen3-32B:**
31
+
32
+ πŸ€” **Supports thinking mode** (<think>...</think>) for state-of-the-art math, coding, and logical reasoning.
33
+ πŸ” **Dynamically switch** via /think and /no_think in conversation for complex problem solving.
34
+ 🧰 **Agent-ready**: integrates seamlessly with tools via Qwen-Agent or MCP for enterprise workflows.
35
+ 🌍 **Fluent in 100+ languages** including Chinese, English, Arabic, Japanese, Spanish, and more.
36
+ πŸ† **State-of-the-art performance** β€” enterprise-grade reasoning and research capabilities.
37
+ 🧠 **Cutting-edge reasoning** for advanced research, complex mathematics, and scientific applications.
38
+ πŸ’Ό **Enterprise-ready** for professional and academic use cases requiring maximum accuracy.
39
+
40
+ ## Available Quantizations (from f16)
41
+
42
+ | Level | Quality | Speed | Size | Recommendation |
43
+ |----------|--------------|----------|-----------|----------------|
44
+ | Q2_K | Minimal | ⚑ Fast | 19.5 GB | Only on severely memory-constrained systems. | | Q3_K_S | Low-Medium | ⚑ Fast | 22.2 GB | Minimal viability; avoid unless space-limited. | | Q3_K_M | Low-Medium | ⚑ Fast | 23.3 GB | Acceptable for basic interaction. | | Q4_K_S | Practical | ⚑ Fast | 27.0 GB | Good balance for mobile/embedded platforms. | | Q4_K_M | Practical | ⚑ Fast | 28.1 GB | Best overall choice for most users. | | Q5_K_S | Max Reasoning | 🐒 Medium | 31.5 GB | Slight quality gain; good for testing. | | Q5_K_M | Max Reasoning | 🐒 Medium | 32.2 GB | Best quality available. Recommended. | | Q6_K | Near-FP16 | 🐌 Slow | 36.5 GB | Diminishing returns. Only if RAM allows. | | Q8_0 | Lossless* | 🐌 Slow | 48.0 GB | Maximum fidelity. Ideal for archival. |
45
+ > πŸ’‘ **Recommendations by Use Case**
46
+ >
47
+ > - - 🏒 **Enterprise Workstations (64GB+ RAM)**: Q5_K_M or Q6_K for maximum quality
48
+ - 🧠 **Advanced Research & Analysis**: Q6_K or Q8_0 for cutting-edge reasoning
49
+ - πŸ”¬ **Scientific Computing**: Q6_K for complex mathematical and scientific tasks
50
+ - πŸ’Ό **Professional Applications**: Q5_K_M for enterprise-grade accuracy
51
+ - πŸ› οΈ **Development & Testing**: Test from Q4_K_M up to Q8_0 based on hardware
52
+ - ⚠️ **Note**: Requires substantial RAM (32GB+ recommended for Q5_K_M+)
53
+
54
+ ## Usage
55
+
56
+ Load this model using:
57
+ - [OpenWebUI](https://openwebui.com) – self-hosted AI interface with RAG & tools
58
+ - [LM Studio](https://lmstudio.ai) – desktop app with GPU support
59
+ - [GPT4All](https://gpt4all.io) – private, offline AI chatbot
60
+ - Or directly via \llama.cpp\
61
+
62
+ Each quantized model includes its own \README.md\ and shares a common \MODELFILE\.
63
+
64
+ ## Author
65
+
66
+ πŸ‘€ Geoff Munn (@geoffmunn)
67
+ πŸ”— [Hugging Face Profile](https://huggingface.co/geoffmunn)
68
+
69
+ ## Disclaimer
70
+
71
+ This is a community conversion for local inference. Not affiliated with Alibaba Cloud or the Qwen team.
SHA256SUMS.txt ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ 19175fb511c6cf0a387c2a6cc169010d607ddeb146e2ee65295b41386d0fa9f2 Qwen3-32B-f16_Q2_K.gguf
2
+ ba7174181bc00be4c3e0c5f00f02e49ac463b96240dcf8b6d2a54540c589f4b3 Qwen3-32B-f16_Q3_K_M.gguf
3
+ d671420412db80143b7dbca8e93fb9d8030370e8cd3a61eada9c43186c0742ff Qwen3-32B-f16_Q3_K_S.gguf
4
+ 24d3d5d2b1b15702177238cc96f9b0d2a64d5feaab5106fa1b7a6eb03e2ce8a7 Qwen3-32B-f16_Q4_K_M.gguf
5
+ 3f144155d9a4633ba7ec15678b2b1134c6ecdcb8b495ba733d5626b02860e035 Qwen3-32B-f16_Q4_K_S.gguf
6
+ 9fc03e1de953be08a62762c3a4f5ab43a3bd138f9ab2498db42ac834a3a38779 Qwen3-32B-f16_Q5_K_M.gguf
7
+ 733e599a386a8c6b46faed72c1c38563486f94442373350c988f9390c6174b1c Qwen3-32B-f16_Q5_K_S.gguf
8
+ a28f654853d111a029da6d6c39941057f84eae31c54fd405a7953479f290145b Qwen3-32B-f16_Q6_K.gguf
9
+ 3fbd9a77581c4150f12339e15905fe44123c4b7c7458700d72319348fc724ed6 Qwen3-32B-f16_Q8_0.gguf