ruslanmv
/

Medical-Llama3-8B-GPTQ

Text Generation

text-generation-inference

Inference Endpoints

4-bit precision

Model card Files Files and versions Community

ruslanmv commited on Apr 24

Commit

6e20780

•

1 Parent(s): 2c9560d

Update README.md

Files changed (1) hide show

README.md +1 -1

README.md CHANGED Viewed

@@ -57,7 +57,7 @@ repo_id = "ruslanmv/Medical-Llama3-8B-GPTQ"
 # download quantized model from Hugging Face Hub and load to the first GPU
 model = AutoGPTQForCausalLM.from_quantized(repo_id,
-                                          device="cuda:0",
                                            use_safetensors=True,
                                            use_triton=False)
 tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)

 # download quantized model from Hugging Face Hub and load to the first GPU
 model = AutoGPTQForCausalLM.from_quantized(repo_id,
+                                          device=device,
                                            use_safetensors=True,
                                            use_triton=False)
 tokenizer = AutoTokenizer.from_pretrained(quantized_model_dir)