mgoin commited on
Commit
03f1419
1 Parent(s): c24d8b4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +3 -3
README.md CHANGED
@@ -27,7 +27,7 @@ By leveraging a pre-sparsified model's structure, you can efficiently fine-tune
27
 
28
  ### Running the model
29
 
30
- This model has not been fine-tuned for instruction-following but may be run with the transformers library. For accelerated inference with sparsity, deploy with [nm-vllm](https://github.com/neuralmagic/nm-vllm) or [deepsparse](https://github.com/neuralmagic/deepsparse).
31
 
32
  ```python
33
  # pip install transformers accelerate
@@ -37,7 +37,7 @@ tokenizer = AutoTokenizer.from_pretrained("neuralmagic/Llama-2-7b-pruned50-retra
37
  model = AutoModelForCausalLM.from_pretrained("neuralmagic/Llama-2-7b-pruned50-retrained-ultrachat", device_map="auto")
38
 
39
  input_text = "Write me a poem about Machine Learning."
40
- input_ids = tokenizer(input_text, return_tensors="pt").to("cuda")
41
 
42
  outputs = model.generate(**input_ids)
43
  print(tokenizer.decode(outputs[0]))
@@ -47,7 +47,7 @@ print(tokenizer.decode(outputs[0]))
47
 
48
  Model evaluation metrics and results.
49
 
50
- | Benchmark | Metric | Llama-2-7b | Llama-2-7b-pruned50-retrained-ultrachat |
51
  |------------------------------------------------|---------------|-------------|-------------------------------|
52
  | [MMLU](https://arxiv.org/abs/2009.03300) | 5-shot, top-1 | xxxx | xxxx |
53
  | [HellaSwag](https://arxiv.org/abs/1905.07830) | 0-shot | xxxx | xxxx |
 
27
 
28
  ### Running the model
29
 
30
+ This model may be run with the transformers library. For accelerated inference with sparsity, deploy with [nm-vllm](https://github.com/neuralmagic/nm-vllm) or [deepsparse](https://github.com/neuralmagic/deepsparse).
31
 
32
  ```python
33
  # pip install transformers accelerate
 
37
  model = AutoModelForCausalLM.from_pretrained("neuralmagic/Llama-2-7b-pruned50-retrained-ultrachat", device_map="auto")
38
 
39
  input_text = "Write me a poem about Machine Learning."
40
+ input_ids = tokenizer.apply_chat_template(input_text, add_generation_prompt=True, return_tensors="pt").to("cuda")
41
 
42
  outputs = model.generate(**input_ids)
43
  print(tokenizer.decode(outputs[0]))
 
47
 
48
  Model evaluation metrics and results.
49
 
50
+ | Benchmark | Metric | Llama-2-7b-ultrachat | Llama-2-7b-pruned50-retrained-ultrachat |
51
  |------------------------------------------------|---------------|-------------|-------------------------------|
52
  | [MMLU](https://arxiv.org/abs/2009.03300) | 5-shot, top-1 | xxxx | xxxx |
53
  | [HellaSwag](https://arxiv.org/abs/1905.07830) | 0-shot | xxxx | xxxx |