Update README.md
Browse files
README.md
CHANGED
@@ -100,8 +100,8 @@ the emergent capabilities LLMs exhibit.
|
|
100 |
|
101 |
Good quants for reading (evaluation speed) are BF16, F16, Q8\_0, and
|
102 |
Q4\_0 (ordered from fastest to slowest). Prompt evaluation is bounded by
|
103 |
-
|
104 |
-
|
105 |
starts hurting more than it helps, since it competes for CPU resources
|
106 |
and makes it harder for the compiler to parallelize instructions. You
|
107 |
want to ideally use the simplest smallest floating point format that's
|
|
|
100 |
|
101 |
Good quants for reading (evaluation speed) are BF16, F16, Q8\_0, and
|
102 |
Q4\_0 (ordered from fastest to slowest). Prompt evaluation is bounded by
|
103 |
+
flop count, which means perf can be improved through software
|
104 |
+
engineering alone, e.g. BLAS algorithms, in which case quantization
|
105 |
starts hurting more than it helps, since it competes for CPU resources
|
106 |
and makes it harder for the compiler to parallelize instructions. You
|
107 |
want to ideally use the simplest smallest floating point format that's
|