Mozilla
/

Mixtral-8x22B-Instruct-v0.1-llamafile

Model card Files Files and versions Community

jartine commited on Apr 25, 2024

Commit

5ad53be

·

verified ·

1 Parent(s): d648638

Update README.md

Files changed (1) hide show

README.md +2 -2

README.md CHANGED Viewed

@@ -100,8 +100,8 @@ the emergent capabilities LLMs exhibit.
 Good quants for reading (evaluation speed) are BF16, F16, Q8\_0, and
 Q4\_0 (ordered from fastest to slowest). Prompt evaluation is bounded by
-computation speed (flops) which means performance can be improved by
-software engineering, e.g. BLAS algorithms, in which case quantization
 starts hurting more than it helps, since it competes for CPU resources
 and makes it harder for the compiler to parallelize instructions. You
 want to ideally use the simplest smallest floating point format that's

 Good quants for reading (evaluation speed) are BF16, F16, Q8\_0, and
 Q4\_0 (ordered from fastest to slowest). Prompt evaluation is bounded by
+flop count, which means perf can be improved through software
+engineering alone, e.g. BLAS algorithms, in which case quantization
 starts hurting more than it helps, since it competes for CPU resources
 and makes it harder for the compiler to parallelize instructions. You
 want to ideally use the simplest smallest floating point format that's