togethercomputer
/

Llama-2-7B-32K-Instruct

@@ -44,6 +44,7 @@ pip install git+https://github.com/HazyResearch/flash-attention.git#subdirectory
 ```
 You can load the model directly from the Hugging Face model hub using
 ```python
 from transformers import AutoTokenizer, AutoModelForCausalLM
 tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Llama-2-7B-32K-Instruct")
@@ -51,7 +52,7 @@ model = AutoModelForCausalLM.from_pretrained("togethercomputer/Llama-2-7B-32K-In
     trust_remote_code=True, torch_dtype=torch.float16)
 input_ids = tokenizer.encode("[INST]\nWrite a poem about cats\n[/INST]\n\n", return_tensors="pt")
 output = model.generate(input_ids, max_length=128,
-    temperature=0.7, repetition_panelty=1.1, top_p=0.7, top_k=50)
 output_text = tokenizer.decode(output[0], skip_special_tokens=True)
 ```
@@ -103,7 +104,9 @@ This poem captures the essence of cats, highlighting their beauty, independence,
 We evaluate the model from three aspects: 1) [Alpaca Eval](https://tatsu-lab.github.io/alpaca_eval/);
 2) [Rouge score over BookSum](https://together.ai/blog/Llama-2-7B-32K); and
 3) [Accuracy over Multi-document Question Answering (MQA)](https://together.ai/blog/Llama-2-7B-32K).
-We compare with models including [https://huggingface.co/meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf),
 [Longchat-7b-16k](https://huggingface.co/lmsys/longchat-7b-16k)
 and [Longchat-7b-v1.5-32k](https://huggingface.co/lmsys/longchat-7b-v1.5-32k).
 We summarize the results below:
@@ -126,6 +129,7 @@ We summarize the results below:
 | Llama-2-7B-Chat-hf | 0.055 | 0.008 | 0.046 |
 | Longchat-7b-16k | 0.303 | 0.055 | 0.160 |
 | Longchat-7b-v1.5-32k | 0.308 | 0.057 | 0.163 |
 | Llama-2-7B-32K-Instruct (ours) | 0.336 | 0.076 | 0.184 |
 * Accuracy over MQA
@@ -134,10 +138,9 @@ We summarize the results below:
 | Llama-2-7B-Chat-hf | 0.384 | 0.375 | 0.313 |
 | Longchat-7b-16k | 0.510 | 0.473 | 0.428 |
 | Longchat-7b-v1.5-32k | 0.534 | 0.516 | 0.479 |
 | Llama-2-7B-32K-Instruct (ours) | 0.622 | 0.604 | 0.589 |
-We observe that our finetuned Llama-2-7B-32K-Instruct consistently outperforms other baseline models including Llama-2-7b-chat, Longchat-7b-16k and Longchat-7b-v1.5-32k.
 ## Limitations and Bias
 As with all language models, Llama-2-7B-32K-Instruct may generate incorrect or biased content. It's important to keep this in mind when using the model.

 ```
 You can load the model directly from the Hugging Face model hub using
 ```python
+import torch
 from transformers import AutoTokenizer, AutoModelForCausalLM
 tokenizer = AutoTokenizer.from_pretrained("togethercomputer/Llama-2-7B-32K-Instruct")
     trust_remote_code=True, torch_dtype=torch.float16)
 input_ids = tokenizer.encode("[INST]\nWrite a poem about cats\n[/INST]\n\n", return_tensors="pt")
 output = model.generate(input_ids, max_length=128,
+    temperature=0.7, repetition_penalty=1.1, top_p=0.7, top_k=50)
 output_text = tokenizer.decode(output[0], skip_special_tokens=True)
 ```
 We evaluate the model from three aspects: 1) [Alpaca Eval](https://tatsu-lab.github.io/alpaca_eval/);
 2) [Rouge score over BookSum](https://together.ai/blog/Llama-2-7B-32K); and
 3) [Accuracy over Multi-document Question Answering (MQA)](https://together.ai/blog/Llama-2-7B-32K).
+We compare with models including
+[GPT-3.5-Turbo-16K](https://platform.openai.com/docs/models/gpt-3-5),
+[https://huggingface.co/meta-llama/Llama-2-7b-chat-hf](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf),
 [Longchat-7b-16k](https://huggingface.co/lmsys/longchat-7b-16k)
 and [Longchat-7b-v1.5-32k](https://huggingface.co/lmsys/longchat-7b-v1.5-32k).
 We summarize the results below:
 | Llama-2-7B-Chat-hf | 0.055 | 0.008 | 0.046 |
 | Longchat-7b-16k | 0.303 | 0.055 | 0.160 |
 | Longchat-7b-v1.5-32k | 0.308 | 0.057 | 0.163 |
+| GPT-3.5-Turbo-16K | 0.324 | 0.066 | 0.178 |
 | Llama-2-7B-32K-Instruct (ours) | 0.336 | 0.076 | 0.184 |
 * Accuracy over MQA
 | Llama-2-7B-Chat-hf | 0.384 | 0.375 | 0.313 |
 | Longchat-7b-16k | 0.510 | 0.473 | 0.428 |
 | Longchat-7b-v1.5-32k | 0.534 | 0.516 | 0.479 |
+| GPT-3.5-Turbo-16K | 0.622 | 0.609 | 0.577 |
 | Llama-2-7B-32K-Instruct (ours) | 0.622 | 0.604 | 0.589 |
 ## Limitations and Bias
 As with all language models, Llama-2-7B-32K-Instruct may generate incorrect or biased content. It's important to keep this in mind when using the model.