File size: 4,648 Bytes
cf633e2 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 |
---
library_name: transformers
license: llama3
datasets:
- aqua_rat
- microsoft/orca-math-word-problems-200k
- m-a-p/CodeFeedback-Filtered-Instruction
---
# Smaug-Llama-3-70B-Instruct
### Built with Meta Llama 3
![image/png](https://cdn-uploads.huggingface.co/production/uploads/64c14f6b02e1f8f67c73bd05/ZxYuHKmU_AtuEJbGtuEBC.png)
This model was built using a new Smaug recipe for improving performance on real world multi-turn conversations applied to
[meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct).
The model outperforms Llama-3-70B-Instruct substantially, and is on par with GPT-4-Turbo, on MT-Bench (see below).
EDIT: Smaug-Llama-3-70B-Instruct is the top open source model on Arena-Hard currently! It is also nearly on par with Claude Opus - see below.
We are conducting additional benchmark evaluations and will add those when available.
### Model Description
- **Developed by:** [Abacus.AI](https://abacus.ai)
- **License:** https://llama.meta.com/llama3/license/
- **Finetuned from model:** [meta-llama/Meta-Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct).
## How to use
The prompt format is unchanged from Llama 3 70B Instruct.
### Use with transformers
See the snippet below for usage with Transformers:
```python
import transformers
import torch
model_id = "abacusai/Smaug-Llama-3-70B-Instruct"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "system", "content": "You are a pirate chatbot who always responds in pirate speak!"},
{"role": "user", "content": "Who are you?"},
]
prompt = pipeline.tokenizer.apply_chat_template(
messages,
tokenize=False,
add_generation_prompt=True
)
terminators = [
pipeline.tokenizer.eos_token_id,
pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = pipeline(
prompt,
max_new_tokens=256,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
print(outputs[0]["generated_text"][len(prompt):])
```
## Evaluation
### Arena-Hard
Score vs selected others (sourced from: (https://lmsys.org/blog/2024-04-19-arena-hard/#full-leaderboard-with-gpt-4-turbo-as-judge)). GPT-4o and Gemini-1.5-pro-latest were missing from the original blob post, and we produced those numbers from a local run using the same methodology.
| Model | Score | 95% Confidence Interval | Average Tokens |
| :---- | ---------: | ----------: | ------: |
| GPT-4-Turbo-2024-04-09 | 82.6 | (-1.8, 1.6) | 662 |
| GPT-4o | 78.3 | (-2.4, 2.1) | 685 |
| Gemini-1.5-pro-latest | 72.1 | (-2.3, 2.2) | 630 |
| Claude-3-Opus-20240229 | 60.4 | (-3.3, 2.4) | 541 |
| **Smaug-Llama-3-70B-Instruct** | 56.7 | (-2.2, 2.6) | 661 |
| GPT-4-0314 | 50.0 | (-0.0, 0.0) | 423 |
| Claude-3-Sonnet-20240229 | 46.8 | (-2.1, 2.2) | 552 |
| Llama-3-70B-Instruct | 41.1 | (-2.5, 2.4) | 583 |
| GPT-4-0613 | 37.9 | (-2.2, 2.0) | 354 |
| Mistral-Large-2402 | 37.7 | (-1.9, 2.6) | 400 |
| Mixtral-8x22B-Instruct-v0.1 | 36.4 | (-2.7, 2.9) | 430 |
| Qwen1.5-72B-Chat | 36.1 | (-2.5, 2.2) | 474 |
| Command-R-Plus | 33.1 | (-2.1, 2.2) | 541 |
| Mistral-Medium | 31.9 | (-2.3, 2.4) | 485 |
| GPT-3.5-Turbo-0613 | 24.8 | (-1.6, 2.0) | 401 |
### MT-Bench
```
########## First turn ##########
score
model turn
Smaug-Llama-3-70B-Instruct 1 9.40000
GPT-4-Turbo 1 9.37500
Meta-Llama-3-70B-Instruct 1 9.21250
########## Second turn ##########
score
model turn
Smaug-Llama-3-70B-Instruct 2 9.0125
GPT-4-Turbo 2 9.0000
Meta-Llama-3-70B-Instruct 2 8.8000
########## Average ##########
score
model
Smaug-Llama-3-70B-Instruct 9.206250
GPT-4-Turbo 9.187500
Meta-Llama-3-70B-Instruct 9.006250
```
| Model | First turn | Second Turn | Average |
| :---- | ---------: | ----------: | ------: |
| **Smaug-Llama-3-70B-Instruct** | 9.40 | 9.01 | 9.21 |
| GPT-4-Turbo | 9.38 | 9.00 | 9.19 |
| Meta-Llama-3-70B-Instruct | 9.21 | 8.80 | 9.01 |
This version of Smaug uses new techniques and new data compared to [Smaug-72B](https://huggingface.co/abacusai/Smaug-72B-v0.1), and more information will be released later on. For now, see the previous Smaug paper: https://arxiv.org/abs/2402.13228. |