32 3 134

Csaba Kecskemeti PRO

csabakecskemeti

https://devquasar.com/

csabakecskemeti

AI & ML interests

None yet

Recent Activity

updated a model 5 minutes ago

DevQuasar/Delta-Vector.Rei-V2-12B-GGUF

published a model about 2 hours ago

DevQuasar/Delta-Vector.Rei-V2-12B-GGUF

updated a model about 2 hours ago

DevQuasar/all-hands.openhands-lm-32b-v0.1-ep3-GGUF

View all activity

Organizations

Posts 24

Post

3334

I'm collecting llama-bench results for inference with a llama 3.1 8B q4 and q8 reference models on varoius GPUs. The results are average of 5 executions.
The system varies (different motherboard and CPU ... but that probably that has little effect on the inference performance).

https://devquasar.com/gpu-gguf-inference-comparison/
the exact models user are in the page

I'd welcome results from other GPUs is you have access do anything else you've need in the post. Hopefully this is useful information everyone.

Post

2362

Managed to get my hands on a 5090FE, it's beefy

| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | pp512 | 12207.44 ± 481.67 |
| llama 8B Q8_0 | 7.95 GiB | 8.03 B | CUDA | 99 | tg128 | 143.18 ± 0.18 |

Comparison with others GPUs
http://devquasar.com/gpu-gguf-inference-comparison/

View all Posts

Collections 1

models 1

csabakecskemeti/bert-base-case-yelp5-tuned-experiment

Text Classification • Updated Apr 5, 2024 • 5

datasets

None public yet