TheBloke
/

LLaMa-13B-GGML

text-generation-inference

Model card Files Files and versions Community

LLaMa-13B-GGML / README.md

TheBloke's picture

Latest GGML v2 format for LLaMa-13B

74ec0c5 about 1 year ago

|

raw history blame

969 Bytes

	---
	inference: false
	license: other
	---
	# LLaMa 13B GGML

	This repo contains GGML format model files for the original LLaMa.

	These files are for CPU (+ CUDA) inference using [llama.cpp](https://github.com/ggerganov/llama.cpp).

	I've uploaded them mostly for my own convenience, allowing me to easily grab them if and when I need them for future testing and comparisons.

	## Provided files

	The following formats are included:
	* float16
	* q4_0 - 4-bit
	* q4_1 - 4-bit
	* q5_0 - 5-bit
	* q5_1 - 5-bit
	* q8_0 - 8-bit

	## THESE FILES REQUIRE LATEST LLAMA.CPP (May 12th 2023 - commit b9fd7ee)!

	llama.cpp recently made a breaking change to its quantisation methods.

	I have quantised the GGML files in this repo with the latest version. Therefore you will require llama.cpp compiled on May 12th or later (commit `b9fd7ee` or later) to use them.

	I will not be providing GGML formats for the older llama.cpp code. They're already uploaded all over HF if you really need them!