OxxoCodes
/

Meta-Llama-3-70B-Instruct-GPTQ

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

Meta-Llama-3-70B-Instruct-GPTQ / README.md

OxxoCodes's picture

Add model card

3c9e279 verified 5 months ago

|

No virus

1.19 kB

	---
	license: other
	license_name: llama3
	tags:
	- llama-3
	- conversational
	---
	# OxxoCodes/Meta-Llama-3-70B-Instruct-GPTQ
	Built with Meta Llama 3

	Meta Llama 3 is licensed under the Meta Llama 3 Community License, Copyright © Meta Platforms, Inc. All Rights Reserved.

	# Model Description
	This is a 4-bit GPTQ quantized version of [meta-llama/Meta-Llama-3-8B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct).

	This model was quantized using the following quantization config:
	```python
	quantize_config = BaseQuantizeConfig(
	bits=4,
	group_size=128,
	desc_act=False,
	damp_percent=0.1,
	)
	```

	To use this model, you need to install AutoGPTQ.
	For detailed installation instructions, please refer to the [AutoGPTQ GitHub repository](https://github.com/AutoGPTQ/AutoGPTQ).

	# Example Usage
	```python
	from auto_gptq import AutoGPTQForCausalLM

	tokenizer = AutoTokenizer.from_pretrained("meta-llama/Meta-Llama-3-70B-Instruct")
	model = AutoGPTQForCausalLM.from_quantized("OxxoCodes/Meta-Llama-3-70B-Instruct-GPTQ")

	output = model.generate(**tokenizer("The capitol of France is", return_tensors="pt").to(model.device))[0]
	print(tokenizer.decode(output))
	```