yuchenglu commited on
Commit
4d80166
1 Parent(s): e7f027c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +7 -4
README.md CHANGED
@@ -44,6 +44,7 @@ pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory
44
  ```
45
  You can load the model directly from the Hugging Face model hub using
46
  ```python
 
47
  from transformers import AutoTokenizer, AutoModelForCausalLM
48
 
49
  tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Llama-2-7B-32K-Instruct")
@@ -51,7 +52,7 @@ model = AutoModelForCausalLM.from_pretrained("togethercomputer/Llama-2-7B-32K-In
51
  trust_remote_code=True, torch_dtype=torch.float16)
52
  input_ids = tokenizer.encode("[INST]\nWrite a poem about cats\n[/INST]\n\n", return_tensors="pt")
53
  output = model.generate(input_ids, max_length=128,
54
- temperature=0.7, repetition_panelty=1.1, top_p=0.7, top_k=50)
55
  output_text = tokenizer.decode(output[0], skip_special_tokens=True)
56
  ```
57
 
@@ -103,7 +104,9 @@ This poem captures the essence of cats, highlighting their beauty, independence,
103
  We evaluate the model from three aspects: 1) [Alpaca Eval](https://tatsu-lab.github.io/alpaca_eval/);
104
  2) [Rouge score over BookSum](https://together.ai/blog/Llama-2-7B-32K); and
105
  3) [Accuracy over Multi-document Question Answering (MQA)](https://together.ai/blog/Llama-2-7B-32K).
106
- We compare with models including [https://huggingface.co/meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf),
 
 
107
  [Longchat-7b-16k](https://huggingface.co/lmsys/longchat-7b-16k)
108
  and [Longchat-7b-v1.5-32k](https://huggingface.co/lmsys/longchat-7b-v1.5-32k).
109
  We summarize the results below:
@@ -126,6 +129,7 @@ We summarize the results below:
126
  | Llama-2-7B-Chat-hf | 0.055 | 0.008 | 0.046 |
127
  | Longchat-7b-16k | 0.303 | 0.055 | 0.160 |
128
  | Longchat-7b-v1.5-32k | 0.308 | 0.057 | 0.163 |
 
129
  | Llama-2-7B-32K-Instruct (ours) | 0.336 | 0.076 | 0.184 |
130
 
131
  * Accuracy over MQA
@@ -134,10 +138,9 @@ We summarize the results below:
134
  | Llama-2-7B-Chat-hf | 0.384 | 0.375 | 0.313 |
135
  | Longchat-7b-16k | 0.510 | 0.473 | 0.428 |
136
  | Longchat-7b-v1.5-32k | 0.534 | 0.516 | 0.479 |
 
137
  | Llama-2-7B-32K-Instruct (ours) | 0.622 | 0.604 | 0.589 |
138
 
139
- We observe that our finetuned Llama-2-7B-32K-Instruct consistently outperforms other baseline models including Llama-2-7b-chat, Longchat-7b-16k and Longchat-7b-v1.5-32k.
140
-
141
  ## Limitations and Bias
142
 
143
  As with all language models, Llama-2-7B-32K-Instruct may generate incorrect or biased content. It's important to keep this in mind when using the model.
 
44
  ```
45
  You can load the model directly from the Hugging Face model hub using
46
  ```python
47
+ import torch
48
  from transformers import AutoTokenizer, AutoModelForCausalLM
49
 
50
  tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Llama-2-7B-32K-Instruct")
 
52
  trust_remote_code=True, torch_dtype=torch.float16)
53
  input_ids = tokenizer.encode("[INST]\nWrite a poem about cats\n[/INST]\n\n", return_tensors="pt")
54
  output = model.generate(input_ids, max_length=128,
55
+ temperature=0.7, repetition_penalty=1.1, top_p=0.7, top_k=50)
56
  output_text = tokenizer.decode(output[0], skip_special_tokens=True)
57
  ```
58
 
 
104
  We evaluate the model from three aspects: 1) [Alpaca Eval](https://tatsu-lab.github.io/alpaca_eval/);
105
  2) [Rouge score over BookSum](https://together.ai/blog/Llama-2-7B-32K); and
106
  3) [Accuracy over Multi-document Question Answering (MQA)](https://together.ai/blog/Llama-2-7B-32K).
107
+ We compare with models including
108
+ [GPT-3.5-Turbo-16K](https://platform.openai.com/docs/models/gpt-3-5),
109
+ [https://huggingface.co/meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf),
110
  [Longchat-7b-16k](https://huggingface.co/lmsys/longchat-7b-16k)
111
  and [Longchat-7b-v1.5-32k](https://huggingface.co/lmsys/longchat-7b-v1.5-32k).
112
  We summarize the results below:
 
129
  | Llama-2-7B-Chat-hf | 0.055 | 0.008 | 0.046 |
130
  | Longchat-7b-16k | 0.303 | 0.055 | 0.160 |
131
  | Longchat-7b-v1.5-32k | 0.308 | 0.057 | 0.163 |
132
+ | GPT-3.5-Turbo-16K | 0.324 | 0.066 | 0.178 |
133
  | Llama-2-7B-32K-Instruct (ours) | 0.336 | 0.076 | 0.184 |
134
 
135
  * Accuracy over MQA
 
138
  | Llama-2-7B-Chat-hf | 0.384 | 0.375 | 0.313 |
139
  | Longchat-7b-16k | 0.510 | 0.473 | 0.428 |
140
  | Longchat-7b-v1.5-32k | 0.534 | 0.516 | 0.479 |
141
+ | GPT-3.5-Turbo-16K | 0.622 | 0.609 | 0.577 |
142
  | Llama-2-7B-32K-Instruct (ours) | 0.622 | 0.604 | 0.589 |
143
 
 
 
144
  ## Limitations and Bias
145
 
146
  As with all language models, Llama-2-7B-32K-Instruct may generate incorrect or biased content. It's important to keep this in mind when using the model.