Latest GGML v2 format for LLaMa-30B

Files changed (6) hide show

README.md ADDED Viewed

+---
+inference: false
+license: other
+---
+# LLaMa 30B GGML
+This repo contains GGML format model files for the original LLaMa.
+These files are for CPU (+ CUDA) inference using [llama.cpp](https://github.com/ggerganov/llama.cpp).
+I've uploaded them mostly for my own convenience, allowing me to easily grab them if and when I need them for future testing and comparisons.
+## Provided files
+The following formats are included:
+* float16
+* q4_0 - 4-bit
+* q4_1 - 4-bit
+* q5_0 - 5-bit
+* q5_1 - 5-bit
+* q8_0 - 8-bit
+## THESE FILES REQUIRE LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)!
+llama.cpp recently made a breaking change to its quantisation methods.
+I have quantised the GGML files in this repo with the latest version. Therefore you will require llama.cpp compiled on May 12th or later (commit `b9fd7ee` or later) to use them.
+I will not be providing GGML formats for the older llama.cpp code. They're already uploaded all over HF if you really need them!

llama-30b.ggml.q4_0.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:d33def6ffd067e82d1d26e4ea46c237545fe939173ab0c533ce726d7c740c3a4
+size 20333775232

llama-30b.ggml.q4_1.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:d53e148ca4ae33de08b8d989fb93f8e4e81b0de05382d1a9474cbc0ad150cb5b
+size 24399792512

llama-30b.ggml.q5_0.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:968b21b29a0e2ff9737ab39e74cc467fc4f91ced5c77cbc85b307d3c9e0ce4e2
+size 22366783872

llama-30b.ggml.q5_1.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:eda7d8684c82f8bb244ecc78b08f116fa7751d22e20e36ca5fadb5106d3f5ec9
+size 24399792512

llama-30b.ggml.q8_0.bin ADDED Viewed

+version https://git-lfs.github.com/spec/v1
+oid sha256:2308ac1d630bcc1b4dc37ae9189db56ad5f8b269dc217ee26c2994fbbf76e5ee
+size 36597844352