LLaMa-7B-GGML / README.md
TheBloke's picture
Latest GGML v2 format for LLaMa-7B
925f643
metadata
inference: false
license: other

LLaMa 7B GGML

This repo contains GGML format model files for the original LLaMa.

These files are for CPU (+ CUDA) inference using llama.cpp.

I've uploaded them mostly for my own convenience, allowing me to easily grab them if and when I need them for future testing and comparisons.

Provided files

The following formats are included:

  • float16
  • q4_0 - 4-bit
  • q4_1 - 4-bit
  • q5_0 - 5-bit
  • q5_1 - 5-bit
  • q8_0 - 8-bit

THESE FILES REQUIRE LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)!

llama.cpp recently made a breaking change to its quantisation methods.

I have quantised the GGML files in this repo with the latest version. Therefore you will require llama.cpp compiled on May 12th or later (commit b9fd7ee or later) to use them.

I will not be providing GGML formats for the older llama.cpp code. They're already uploaded all over HF if you really need them!