cortecs
/

Llama-3-SauerkrautLM-70b-Instruct-GPTQ

Text Generation

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Llama-3-SauerkrautLM-70b-Instruct-GPTQ / README.md

markoarnauto's picture

Upload README.md with huggingface_hub

9f16fb2 verified 7 months ago

|

8.17 kB

	---
	datasets: LeoLM/wikitext-en-de
	license: other
	license_link: https://llama.meta.com/llama3/license/
	---
	This is a quantized model of [Llama-3-SauerkrautLM-70b-Instruct](https://huggingface.co/VAGOSolutions/Llama-3-SauerkrautLM-70b-Instruct) using GPTQ developed by [IST Austria](https://ist.ac.at/en/research/alistarh-group/)
	using the following configuration:
	- 4bit
	- Act order: True
	- Group size: 128

	## Usage
	Install vLLM and
	run the [server](https://docs.vllm.ai/en/latest/serving/openai_compatible_server.html#openai-compatible-server):

	```
	python -m vllm.entrypoints.openai.api_server --model cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ
	```
	Access the model:
	```
	curl http://localhost:8000/v1/completions -H "Content-Type: application/json" -d ' {
	"model": "cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ",
	"prompt": "San Francisco is a"
	} '
	```

	## Evaluations
	\| __English__ \| __[Llama-3-SauerkrautLM-70b-Instruct](https://huggingface.co/VAGOsolutions/Llama-3-SauerkrautLM-70b-Instruct)__ \| __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b)__ \| __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ)__ \|
	\|:--------------\|:------------------------------------------------------------------------------------------------------------------\|:----------------------------------------------------------------------------------------------------------------------------\|:----------------------------------------------------------------------------------------------------------------------\|
	\| Avg. \| 78.17 \| 78.1 \| 76.72 \|
	\| ARC \| 74.5 \| 74.4 \| 73.0 \|
	\| Hellaswag \| 79.2 \| 79.2 \| 78.0 \|
	\| MMLU \| 80.8 \| 80.7 \| 79.15 \|
	\| \| \| \| \|
	\| __German__ \| __[Llama-3-SauerkrautLM-70b-Instruct](https://huggingface.co/VAGOsolutions/Llama-3-SauerkrautLM-70b-Instruct)__ \| __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b)__ \| __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ)__ \|
	\| Avg. \| 70.83 \| 70.47 \| 69.13 \|
	\| ARC_de \| 66.7 \| 66.2 \| 65.9 \|
	\| Hellaswag_de \| 70.8 \| 71.0 \| 68.8 \|
	\| MMLU_de \| 75.0 \| 74.2 \| 72.7 \|
	\| \| \| \| \|
	\| __Safety__ \| __[Llama-3-SauerkrautLM-70b-Instruct](https://huggingface.co/VAGOsolutions/Llama-3-SauerkrautLM-70b-Instruct)__ \| __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ-8b)__ \| __[Llama-3-SauerkrautLM-70b-Instruct-GPTQ](https://huggingface.co/cortecs/Llama-3-SauerkrautLM-70b-Instruct-GPTQ)__ \|
	\| Avg. \| 65.86 \| 65.94 \| 65.94 \|
	\| RealToxicityPrompts \| 97.6 \| 97.8 \| 98.4 \|
	\| TruthfulQA \| 67.07 \| 66.92 \| 65.56 \|
	\| CrowS \| 32.92 \| 33.09 \| 33.87 \|

	We did not check for data contamination.
	Evaluation was done using [Eval. Harness](https://github.com/EleutherAI/lm-evaluation-harness) using `limit=1000`.

	## Performance
	\| \| requests/s \| tokens/s \|
	\|:--------------\|-------------:\|-----------:\|
	\| NVIDIA L40Sx2 \| 2.19 \| 1044.76 \|
	Performance measured on [cortecs inference](https://cortecs.ai).