smpanaro
/

pythia-160m-AutoGPTQ-4bit-128g

Text Generation

Inference Endpoints

text-generation-inference

4-bit precision

Model card Files Files and versions Community

pythia-160m-AutoGPTQ-4bit-128g / README.md

smpanaro's picture

Create README.md

c2667aa verified 3 months ago

|

raw history blame contribute delete

No virus

803 Bytes

	---
	license: mit
	datasets:
	- wikitext
	---

	[pythia-160m](https://huggingface.co/EleutherAI/pythia-160m) quantized to 4-bit using [AutoGPTQ](https://github.com/AutoGPTQ/AutoGPTQ).

	To use, first install AutoGPTQ:

	```shell
	pip install auto-gptq
	```

	Then load the model from the hub:
	```python
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from auto_gptq import AutoGPTQForCausalLM, BaseQuantizeConfig

	model_name = "smpanaro/pythia-160m-AutoGPTQ-4bit-128g"
	model = AutoGPTQForCausalLM.from_quantized(model_name)
	```


	\|Model\|4-Bit Perplexity\|16-Bit Perplexity\|Delta\|
	\|--\|--\|--\|--\|
	\|smpanaro/pythia-160m-AutoGPTQ-4bit-128g\|33.4375\|23.3024\|10.1351\|
	<sub>Wikitext perplexity measured as in the [huggingface docs](https://huggingface.co/docs/transformers/en/perplexity), lower is better</sub>