rogkesavan commited on
Commit
8861939
·
verified ·
1 Parent(s): 067871e

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +176 -1
README.md CHANGED
@@ -182,4 +182,179 @@ Positive references to related work:
182
 
183
  * Cerebras — [https://arxiv.org/abs/2510.13999](https://arxiv.org/abs/2510.13999)
184
  * Alibaba Cloud Computing — [https://arxiv.org/html/2511.01354v1](https://arxiv.org/html/2511.01354v1)
185
- * QLoRA — [https://arxiv.org/abs/2307.02973](https://arxiv.org/abs/2307.02973)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
182
 
183
  * Cerebras — [https://arxiv.org/abs/2510.13999](https://arxiv.org/abs/2510.13999)
184
  * Alibaba Cloud Computing — [https://arxiv.org/html/2511.01354v1](https://arxiv.org/html/2511.01354v1)
185
+ * QLoRA — [https://arxiv.org/abs/2307.02973](https://arxiv.org/abs/2307.02973)# THRIFT — Targeted Reduction for Inference and Fine-Tuning
186
+
187
+ A performance-optimized variant of the base model that delivers faster responses and lower memory usage while preserving quality for everyday tasks, developed by VibeStud.io.
188
+
189
+ ## TLDR
190
+
191
+ We, over-caffinated researchers at VibeStud.io wanted to create a 50% pruned version of the SOTA MiniMax M2 that is best suited for local/air-gapped coding. This version we achieved \~25%. A 50% pruned version is under development while a not so sucky team of ours is working on a 50% pruned version of Kimi K2 Thinking. Check back later, cheers\!
192
+
193
+ ## Why it’s useful
194
+
195
+ * **Lower latency:** Snappier responses for interactive apps and chatbots.
196
+ * **Smaller memory footprint:** Runs on cheaper GPUs or with fewer resources per replica.
197
+ * **Higher throughput:** Serve more concurrent users at the same cost.
198
+ * **Deployment-friendly:** Drop-in replacement for the base model in most inference stacks.
199
+ * **Adaptable:** Supports light fine-tuning to match your domain and style guidelines.
200
+
201
+ ## Intended use
202
+
203
+ * General chat and coding assistance
204
+ * Enterprise assistants with strict latency/VRAM budgets
205
+ * Batch or realtime serving in cloud and on-prem environments
206
+ * Edge or cost-sensitive deployments where efficiency matters
207
+
208
+ ## When to use it
209
+
210
+ * You’re constrained by GPU memory or need shorter response times
211
+ * You want to increase QPS without scaling infrastructure
212
+ * You need a model that is “good enough” for most tasks at a better cost profile
213
+
214
+ ---
215
+
216
+ # Model Comparison Report
217
+
218
+ **Models Under Evaluation**
219
+
220
+ | Model | Type |
221
+ | :---- | :---- |
222
+ | ModelCloud/MiniMax-M2-BF16 | Base Model |
223
+ | VibeStudio/MiniMax-M2-THRIFT | Compressed/Optimized |
224
+
225
+ **Evaluation Date: November 7, 2025**
226
+
227
+ ## 📊 Results Comparison
228
+
229
+ ### 1\) Multiple Choice Q\&A (lm-eval)
230
+
231
+ **Overall MMLU Performance**
232
+
233
+ | Model | MMLU Overall | Humanities | STEM | Social Sciences | Other |
234
+ | :---- | ----: | ----: | ----: | ----: | ----: |
235
+ | MiniMax-M2-BF16 | 83.16% | 77.45% | 80.91% | 90.02% | 87.29% |
236
+ | MiniMax-M2-THRIFT | 77.72% | 70.14% | 77.61% | 86.84% | 80.27% |
237
+ | **Δ (Difference)** | **\-5.44%** | **\-7.31%** | **\-3.30%** | **\-3.18%** | **\-7.02%** |
238
+
239
+ **Individual Task Performance**
240
+
241
+ | Task | BF16 (Base) | THRIFT-BF16 | Difference |
242
+ | :---- | ----: | ----: | ----: |
243
+ | arc\_challenge | 73.21% | 61.01% | \-12.20% ⬇️ |
244
+ | arc\_easy | 88.30% | 83.08% | \-5.22% ⬇️ |
245
+ | boolq | 87.95% | 84.95% | \-3.00% ⬇️ |
246
+ | hellaswag | 83.00% | 77.09% | \-5.91% ⬇️ |
247
+ | mmlu | 83.16% | 77.72% | \-5.44% ⬇️ |
248
+ | openbookqa | 48.60% | 43.00% | \-5.60% ⬇️ |
249
+ | rte | 75.45% | 80.14% | **\+4.69% ⬆️** |
250
+ | winogrande | 76.48% | 74.90% | \-1.58% ⬇️ |
251
+
252
+ **Average Accuracy Drop: \-4.28%**
253
+
254
+ ### 2\) Code Generation (EvalPlus)
255
+
256
+ **MBPP Results**
257
+
258
+ | Model | MBPP (base) | MBPP+ (extended) |
259
+ | :---- | ----: | ----: |
260
+ | MiniMax-M2-BF16 | 73.8% | 64.0% |
261
+ | MiniMax-M2-THRIFT | 🔄 Coming Soon | 🔄 Coming Soon |
262
+
263
+ **HumanEval Results**
264
+
265
+ | Model | HumanEval (base) | HumanEval+ (extended) |
266
+ | :---- | ----: | ----: |
267
+ | MiniMax-M2-BF16 | ✅ Complete | ✅ Complete |
268
+ | MiniMax-M2-THRIFT | 🔄 Coming Soon | 🔄 Coming Soon |
269
+
270
+ ### 3\) Math Benchmarks
271
+
272
+ **GSM8K Results**
273
+
274
+ | Model | Accuracy | Problems |
275
+ | :---- | ----: | ----: |
276
+ | MiniMax-M2-BF16 | 92.72% | 1,319 |
277
+ | MiniMax-M2-THRIFT | 🔄 Coming Soon | 1,319 |
278
+
279
+ **MATH-500 Results**
280
+
281
+ | Model | Overall | Level 1 | Level 2 | Level 3 | Level 4 | Level 5 |
282
+ | :---- | ----: | ----: | ----: | ----: | ----: | ----: |
283
+ | MiniMax-M2-BF16 | 87.2% | 90.7% | 95.56% | 82.86% | 85.16% | 85.82% |
284
+ | MiniMax-M2-THRIFT | 🔄 Coming Soon | 🔄 | 🔄 | 🔄 | 🔄 | 🔄 |
285
+
286
+ ### 4\) LiveCodeBench (Live Coding Problems)
287
+
288
+ | Model | pass@1 | Problems | Status |
289
+ | :---- | ----: | ----: | :---- |
290
+ | **MiniMax-M2-BF16** | **35.71%** | 182 | ✅ Complete |
291
+ | **MiniMax-M2-THRIFT** | 🔄 Coming Soon | 182 | ⏳ Not Started Yet |
292
+
293
+ ---
294
+
295
+ ## 📈 Analysis (Preliminary)
296
+
297
+ ### Key Findings
298
+
299
+ **MMLU Performance Drop**
300
+
301
+ * THRIFT-BF16 shows **\-5.44%** overall MMLU drop
302
+ * Largest drop: **arc\_challenge (-12.20%)**
303
+ * Smallest drop: **winogrande (-1.58%)**
304
+ * **RTE improved by \+4.69%** 🎉
305
+
306
+ **Subject-Specific Performance**
307
+
308
+ * Best preservation: **Social Sciences (-3.18%)**
309
+ * Most degraded: **Other (-7.02%)**
310
+ * STEM: **Moderate drop (-3.30%)**
311
+
312
+ **Compression Trade-off**
313
+
314
+ * THRIFT-BF16 (compressed) vs BF16 (base)
315
+ * Average accuracy loss: **\~4–5%**
316
+ * Expected for compressed/quantized models
317
+
318
+ **MMLU Category Breakdown**
319
+
320
+ | Category | BF16 (Base) | THRIFT-BF16 | Difference | Status |
321
+ | :---- | ----: | ----: | ----: | :---- |
322
+ | High School Government | 97.93% | 94.82% | \-3.11% | ✅ Still Excellent |
323
+ | High School Psychology | 95.41% | 93.58% | \-1.83% | ✅ Well Preserved |
324
+ | Marketing | 95.73% | 91.88% | \-3.85% | ✅ Good |
325
+ | Professional Medicine | 92.28% | 79.78% | \-12.50% | ⚠️ Notable Drop |
326
+ | Clinical Knowledge | 92.83% | 85.66% | \-7.17% | ⚠️ Moderate Drop |
327
+
328
+ ---
329
+
330
+ ## Benchmarks
331
+
332
+ Coming soon.
333
+
334
+ ## Research paper
335
+
336
+ Coming soon.
337
+
338
+ ---
339
+
340
+ ## License
341
+
342
+ This model is derived from MiniMax-M2 and distributed under the MIT License [http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE](http://github.com/MiniMax-AI/MiniMax-M2/blob/main/LICENSE)
343
+
344
+ ---
345
+
346
+ ## Credits
347
+
348
+ Model conversion and HF Transformers code by @Qubitum at ModelCloud.
349
+
350
+ Positive references to related work:
351
+
352
+ * Alibaba Cloud Computing — [https://arxiv.org/html/2511.01354v1](https://arxiv.org/html/2511.01354v1)
353
+ * Cerebras — [https://arxiv.org/abs/2510.13999](https://arxiv.org/abs/2510.13999)
354
+ * QLoRA — [https://arxiv.org/abs/2307.02973](https://arxiv.org/abs/2307.02973)
355
+ * SparseGPT ([https://arxiv.org/abs/2301.00774](https://arxiv.org/abs/2301.00774))
356
+ * Wanda ([https://arxiv.org/abs/2306.11695](https://arxiv.org/abs/2306.11695))
357
+ * LLM-Pruner ([https://arxiv.org/abs/2305.11627](https://arxiv.org/abs/2305.11627))
358
+ * Sheared-LLaMA ([https://arxiv.org/abs/2310.06694](https://arxiv.org/abs/2310.06694))
359
+ * Wanda++ (2025):([https://arxiv.org/abs/2503.04992](https://arxiv.org/abs/2503.04992))
360
+ * Týr-the-Pruner ([https://arxiv.org/abs/2503.09657](https://arxiv.org/abs/2503.09657))