File size: 968 Bytes
925f643 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 |
---
inference: false
license: other
---
# LLaMa 7B GGML
This repo contains GGML format model files for the original LLaMa.
These files are for CPU (+ CUDA) inference using [llama.cpp](https://github.com/ggerganov/llama.cpp).
I've uploaded them mostly for my own convenience, allowing me to easily grab them if and when I need them for future testing and comparisons.
## Provided files
The following formats are included:
* float16
* q4_0 - 4-bit
* q4_1 - 4-bit
* q5_0 - 5-bit
* q5_1 - 5-bit
* q8_0 - 8-bit
## THESE FILES REQUIRE LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)!
llama.cpp recently made a breaking change to its quantisation methods.
I have quantised the GGML files in this repo with the latest version. Therefore you will require llama.cpp compiled on May 12th or later (commit `b9fd7ee` or later) to use them.
I will not be providing GGML formats for the older llama.cpp code. They're already uploaded all over HF if you really need them!
|