inferencerlabs
/

MiniMax-M2-MLX-6.5bit

Text Generation

Model card Files Files and versions

inferencerlabs commited on 13 days ago

Commit

d91a2c8

·

verified ·

1 Parent(s): 8833170

Upload complete model

Files changed (1) hide show

README.md +1 -3

README.md CHANGED Viewed

@@ -6,8 +6,6 @@ base_model: MiniMaxAI/MiniMax-M2
 tags:
 - mlx
 ---
-*UPLOADING*
 **See MiniMax-M2 6.5bit MLX in action - [demonstration video](https://youtu.be/DCVKP_o2HU0)**
 *q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8.*
@@ -25,7 +23,7 @@ tags:
 * Tested on a MacBook Pro connecting to a M3 Ultra 512GB RAM over the internet using [Inferencer app v1.5.4](https://inferencer.com)
 * Memory usage: ~175 GB
 * Expect 42 tokens/s for small contexts (200 tokens) down to 12 token/s for large (6800 tokens)
-** Note: Performance has been improved since original tests by 16.7% see: [github.com/inferencer/issues/46](https://github.com/inferencerlabs/inferencer-feedback/issues/46)
 * Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.28
 * For more details see [demonstration video](https://youtu.be/DCVKP_o2HU0) or visit [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2).

 tags:
 - mlx
 ---
 **See MiniMax-M2 6.5bit MLX in action - [demonstration video](https://youtu.be/DCVKP_o2HU0)**
 *q6.5bit quant typically achieves 1.128 perplexity in our testing which is equivalent to q8.*
 * Tested on a MacBook Pro connecting to a M3 Ultra 512GB RAM over the internet using [Inferencer app v1.5.4](https://inferencer.com)
 * Memory usage: ~175 GB
 * Expect 42 tokens/s for small contexts (200 tokens) down to 12 token/s for large (6800 tokens)
+**Note: Performance has been improved since original tests by 16.7% see: [github.com/inferencer/issues/46](https://github.com/inferencerlabs/inferencer-feedback/issues/46)**
 * Quantized with a modified version of [MLX](https://github.com/ml-explore/mlx) 0.28
 * For more details see [demonstration video](https://youtu.be/DCVKP_o2HU0) or visit [MiniMax-M2](https://huggingface.co/MiniMaxAI/MiniMax-M2).