---
inference: false
license: other
---
# LLaMa 7B GGML
This repo contains GGML format model files for the original LLaMa.
These files are for CPU (+ CUDA) inference using [llama.cpp](https://github.com/ggerganov/llama.cpp).
I've uploaded them mostly for my own convenience, allowing me to easily grab them if and when I need them for future testing and comparisons.
## Provided files
The following formats are included:
* float16
* q4_0 - 4-bit
* q4_1 - 4-bit
* q5_0 - 5-bit
* q5_1 - 5-bit
* q8_0 - 8-bit
## THESE FILES REQUIRE LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)!
llama.cpp recently made a breaking change to its quantisation methods.
I have quantised the GGML files in this repo with the latest version. Therefore you will require llama.cpp compiled on May 12th or later (commit `b9fd7ee` or later) to use them.
I will not be providing GGML formats for the older llama.cpp code. They're already uploaded all over HF if you really need them!
## Want to support my work?
I've had a lot of people ask if they can contribute. I love providing models and helping people, but it is starting to rack up pretty big cloud computing bills.
So if you're able and willing to contribute, it'd be most gratefully received and will help me to keep providing models, and work on various AI projects.
Donaters will get priority support on any and all AI/LLM/model questions, and I'll gladly quantise any model you'd like to try.
* Patreon: coming soon! (just awaiting approval)
* Ko-Fi: https://ko-fi.com/TheBlokeAI
* Discord: https://discord.gg/UBgz4VXf