Upload complete model
Browse files
README.md
CHANGED
|
@@ -6,8 +6,6 @@ base_model: MiniMaxAI/MiniMax-M2
|
|
| 6 |
tags:
|
| 7 |
- mlx
|
| 8 |
---
|
| 9 |
-
*UPLOADING*
|
| 10 |
-
|
| 11 |
**See MiniMax-M2 6.5bit MLX in action - [demonstration video](https://youtu.be/DCVKP_o2HU0)**
|
| 12 |
|
| 13 |
*q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8.*
|
|
@@ -25,7 +23,7 @@ tags:
|
|
| 25 |
* Tested on a MacBook Pro connecting to a M3 Ultra 512GB RAM over the internet using [Inferencer app v1.5.4](https://inferencer.com)
|
| 26 |
* Memory usage: ~175 GB
|
| 27 |
* Expect 42 tokens/s for small contexts (200 tokens) down to 12 token/s for large (6800 tokens)
|
| 28 |
-
**
|
| 29 |
* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.28
|
| 30 |
* For more details see [demonstration video](https://youtu.be/DCVKP_o2HU0) or visit [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2).
|
| 31 |
|
|
|
|
| 6 |
tags:
|
| 7 |
- mlx
|
| 8 |
---
|
|
|
|
|
|
|
| 9 |
**See MiniMax-M2 6.5bit MLX in action - [demonstration video](https://youtu.be/DCVKP_o2HU0)**
|
| 10 |
|
| 11 |
*q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8.*
|
|
|
|
| 23 |
* Tested on a MacBook Pro connecting to a M3 Ultra 512GB RAM over the internet using [Inferencer app v1.5.4](https://inferencer.com)
|
| 24 |
* Memory usage: ~175 GB
|
| 25 |
* Expect 42 tokens/s for small contexts (200 tokens) down to 12 token/s for large (6800 tokens)
|
| 26 |
+
**Note: Performance has been improved since original tests by 16.7% see: [github.com/inferencer/issues/46](https://github.com/inferencerlabs/inferencer-feedback/issues/46)**
|
| 27 |
* Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.28
|
| 28 |
* For more details see [demonstration video](https://youtu.be/DCVKP_o2HU0) or visit [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2).
|
| 29 |
|