Apel-sin
/

llama-3-8B-ortho-v2-exl2

Model card Files Files and versions Community

llama-3-8B-ortho-v2-exl2 / README.md

Apel-sin's picture

Update README.md

d2eca08 verified 3 months ago

|

history blame contribute delete

No virus

1.2 kB

	# Exllama v2 Llama-3-8B-Instruct-ortho-v2

	Using <a href="https://github.com/turboderp/exllamav2/releases/tag/v0.0.21">turboderp's ExLlamaV2 v0.0.21</a> for quantization.

	<b>The "main" branch only contains the measurement.json, download one of the other branches for the model</b>

	Each branch contains an individual bits per weight, with the main one containing only the meaurement.json for further conversions.

	Original model by <a href="https://huggingface.co/hjhj3168">hjhj3168</a><br>
	Calibration dataset: <a href="https://huggingface.co/datasets/cosmicvalor/toxic-qna">toxic-qna</a>

	## Available sizes

	\| Branch \| Bits \| lm_head bits \| VRAM (4k) \| VRAM (8K) \| VRAM (16k) \| VRAM (32k) \| Description \|
	\| ----- \| ---- \| ------- \| ------ \| ------ \| ------ \| ------ \| ------------ \|
	\| [8_0](https://huggingface.co/Apel-sin/llama-3-8B-ortho-v2-exl2/tree/8_0) \| 8.0 \| 8.0 \| 10.1 GB \| 10.5 GB \| 11.5 GB \| 13.6 GB \| Maximum quality that ExLlamaV2 can produce, near unquantized performance. \|
	\| [6_5](https://huggingface.co/Apel-sin/llama-3-8B-ortho-v2-exl2/tree/6_5) \| 6.5 \| 8.0 \| 8.9 GB \| 9.3 GB \| 10.3 GB \| 12.4 GB \| Very similar to 8.0, good tradeoff of size vs performance, recommended. \|