README.md · TheBloke/LLaMa-7B-GGML at refs/pr/1

metadata

inference: false
license: other

Chat & support: my new Discord server

Want to contribute? Patreon coming soon!

# LLaMa 7B GGML

This repo contains GGML format model files for the original LLaMa.

These files are for CPU (+ CUDA) inference using llama.cpp.

I've uploaded them mostly for my own convenience, allowing me to easily grab them if and when I need them for future testing and comparisons.

Provided files

The following formats are included:

float16
q4_0 - 4-bit
q4_1 - 4-bit
q5_0 - 5-bit
q5_1 - 5-bit
q8_0 - 8-bit

THESE FILES REQUIRE LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)!

llama.cpp recently made a breaking change to its quantisation methods.

I have quantised the GGML files in this repo with the latest version. Therefore you will require llama.cpp compiled on May 12th or later (commit b9fd7ee or later) to use them.

I will not be providing GGML formats for the older llama.cpp code. They're already uploaded all over HF if you really need them!

Want to support my work?

I've had a lot of people ask if they can contribute. I love providing models and helping people, but it is starting to rack up pretty big cloud computing bills.

So if you're able and willing to contribute, it'd be most gratefully received and will help me to keep providing models, and work on various AI projects.

Donaters will get priority support on any and all AI/LLM/model questions, and I'll gladly quantise any model you'd like to try.

Patreon: coming soon! (just awaiting approval)
Ko-Fi: https://ko-fi.com/TheBlokeAI
Discord: https://discord.gg/UBgz4VXf