GGML models that can run f16 41.68 ms per token and q8 23.76 ms per token giving good results
56d7c99
Kabumbus
commited on