neuralmagic
/

llama-2-7b-chat-marlin

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

robertgshaw2 commited on Jan 18, 2024

Commit

c2e64dd

·

verified ·

1 Parent(s): 2023f80

Update README.md

Files changed (1) hide show

README.md +2 -3

README.md CHANGED Viewed

@@ -32,7 +32,7 @@ Convert with the `convert.py` script in this repo:
 ```bash
 python3 convert.py --model-id "TheBloke/Llama-2-7B-Chat-GPTQ" --save-path "./marlin-model" --do-generation
-```bash
 ### Run Model
@@ -47,8 +47,7 @@ model_path = "./marlin-model"
 model = load_model(model_path).to("cuda")
 tokenizer = AutoTokenizer.from_pretrained(model_path)
-# Run inference to confirm it is working.
 inputs = tokenizer("My favorite song is", return_tensors="pt")
 inputs = {k: v.to("cuda") for k, v in inputs.items()}
 outputs = model.generate(**inputs, max_new_tokens=50, do_sample=False)

 ```bash
 python3 convert.py --model-id "TheBloke/Llama-2-7B-Chat-GPTQ" --save-path "./marlin-model" --do-generation
+```
 ### Run Model
 model = load_model(model_path).to("cuda")
 tokenizer = AutoTokenizer.from_pretrained(model_path)
+# Generate text.
 inputs = tokenizer("My favorite song is", return_tensors="pt")
 inputs = {k: v.to("cuda") for k, v in inputs.items()}
 outputs = model.generate(**inputs, max_new_tokens=50, do_sample=False)