GGML models that can run f16 41.68 ms per token and q8 23.76 ms per token giving good results
Browse files- ggml-model-f16.bin +3 -0
- ggml-model-q8_0.bin +3 -0
ggml-model-f16.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:b12534933201810c5cc5b6eb033f07ff6232e03f1b3cc4820b0e18566d113f3e
|
3 |
+
size 2623816724
|
ggml-model-q8_0.bin
ADDED
@@ -0,0 +1,3 @@
|
|
|
|
|
|
|
|
|
1 |
+
version https://git-lfs.github.com/spec/v1
|
2 |
+
oid sha256:4af6467cf42a8e5341c471fe0b370e5d038e521877b4612284c3ce8abbd26f4a
|
3 |
+
size 1394525204
|