GGML models that can run f16 41.68 ms per token and q8 23.76 ms per token giving good results

Files changed (2) hide show

ggml-model-f16.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:b12534933201810c5cc5b6eb033f07ff6232e03f1b3cc4820b0e18566d113f3e
+size 2623816724

ggml-model-q8_0.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:4af6467cf42a8e5341c471fe0b370e5d038e521877b4612284c3ce8abbd26f4a
+size 1394525204