bullerwins commited on
Commit
2377db7
1 Parent(s): bcd8fd1

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +87 -0
README.md ADDED
@@ -0,0 +1,87 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ tags:
7
+ - RLHF
8
+ - Nexusflow
9
+ - Athene
10
+ - Chat Model
11
+ ---
12
+ Quantized GGUF version using llama.cpp
13
+
14
+ Original model [Nexusflow/Athene-70B](https://huggingface.co/Nexusflow/Athene-70B)
15
+
16
+ # Athene-Llama3-70B
17
+
18
+ We introduce Athene-Llama3-70B, an open-weights LLM trained through RLHF based off Llama-3-70B-Instruct. Athene-70B achieves a high score on Arena-Hard-Auto, a proxy benchmark for Chatbot Arena.
19
+
20
+ - **Developed by:** The Nexusflow Team (Evan Frick\*, Peter Jin\*, Tianle Li\*, Karthik Ganesan, Jian Zhang, Jiantao Jiao and Banghua Zhu).
21
+ - **Model type:** Chat Model
22
+ - **Finetuned from model:** [Llama-3-70B-Instruct](https://huggingface.co/meta-llama/Meta-Llama-3-70B-Instruct).
23
+
24
+ Blog: https://nexusflow.ai/blogs/athene
25
+
26
+ | Model | Arena-Hard |
27
+ |---------------------------------|------------|
28
+ | Claude-3.5-Sonnet (Proprietary) | 79.3% |
29
+ | GPT-4o (Proprietary) | 79.2% |
30
+ | **Athene-70B (Open)** | 77.8% |
31
+ | Gemini-Pro-1.5 (Proprietary) | 72.0% |
32
+ | Gemma-2-27B (Open) | 57.0% |
33
+ | Llama-3-70B (Open) | 46.6% |
34
+
35
+ ## Usage
36
+
37
+ Athene-70B uses the same chat template as Llama-3-70B-Instruct. Below is an example simple usage using the Transformers library.
38
+
39
+ ```Python
40
+ import transformers
41
+ import torch
42
+
43
+ model_id = "Nexusflow/Athene-70B"
44
+
45
+ pipeline = transformers.pipeline(
46
+ "text-generation",
47
+ model=model_id,
48
+ model_kwargs={"torch_dtype": torch.bfloat16},
49
+ device_map="auto",
50
+ )
51
+
52
+ messages = [
53
+ {"role": "system", "content": "You are an Athene Noctura, you can only speak with owl sounds. Whoooo whooo."},
54
+ {"role": "user", "content": "Whooo are you?"},
55
+ ]
56
+
57
+ terminators = [
58
+ pipeline.tokenizer.eos_token_id,
59
+ pipeline.tokenizer.convert_tokens_to_ids("<|end_of_text|>")
60
+ ]
61
+
62
+ outputs = pipeline(
63
+ messages,
64
+ max_new_tokens=256,
65
+ eos_token_id=terminators,
66
+ do_sample=True,
67
+ temperature=0.6,
68
+ top_p=0.9,
69
+ )
70
+ print(outputs[0]["generated_text"][-1])
71
+ ```
72
+
73
+ ## Acknowledgment
74
+
75
+ We would like to thank the [LMSYS Organization](https://lmsys.org/) for their support of online demo and private test. We would like to thank Meta AI and the open source community for their efforts in providing the datasets and base models.
76
+
77
+ ## Citation
78
+
79
+ ```
80
+ @misc{Athene2024,
81
+ title = {Athene-70B: Redefining the Boundaries of Post-Training for Open Models},
82
+ url = {https://nexusflow.ai/blogs/athene},
83
+ author = {Frick, Evan and Jin, Peter and Li, Tianle and Ganesan, Karthik and Zhang, Jian and Jiao, Jiantao and Zhu, Banghua},
84
+ month = {July},
85
+ year = {2024}
86
+ }
87
+ ```