LLaMa-7B-GGML / README.md
TheBloke's picture
Updating model files
41966a1
|
raw
history blame
2.2 kB
metadata
inference: false
license: other
TheBlokeAI
# LLaMa 7B GGML

This repo contains GGML format model files for the original LLaMa.

These files are for CPU (+ CUDA) inference using llama.cpp.

I've uploaded them mostly for my own convenience, allowing me to easily grab them if and when I need them for future testing and comparisons.

Provided files

The following formats are included:

  • float16
  • q4_0 - 4-bit
  • q4_1 - 4-bit
  • q5_0 - 5-bit
  • q5_1 - 5-bit
  • q8_0 - 8-bit

THESE FILES REQUIRE LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)!

llama.cpp recently made a breaking change to its quantisation methods.

I have quantised the GGML files in this repo with the latest version. Therefore you will require llama.cpp compiled on May 12th or later (commit b9fd7ee or later) to use them.

I will not be providing GGML formats for the older llama.cpp code. They're already uploaded all over HF if you really need them!

Want to support my work?

I've had a lot of people ask if they can contribute. I love providing models and helping people, but it is starting to rack up pretty big cloud computing bills.

So if you're able and willing to contribute, it'd be most gratefully received and will help me to keep providing models, and work on various AI projects.

Donaters will get priority support on any and all AI/LLM/model questions, and I'll gladly quantise any model you'd like to try.