Files changed (1) hide show
  1. README.md +93 -81
README.md CHANGED
@@ -1,82 +1,94 @@
1
- ---
2
- license: other
3
- language:
4
- - en
5
- library_name: transformers
6
- tags:
7
- - RLHF
8
- - Nexusflow
9
- - Athene
10
- - Chat Model
11
- base_model:
12
- - Qwen/Qwen2.5-72B-Instruct
13
- ---
14
- > [!NOTE]
15
- > EXL2 4.65bpw-h6 quantized version of [Nexusflow/Athene-V2-Chat](https://huggingface.co/Nexusflow/Athene-V2-Chat). Supports 32K context with Q4 cache on systems with 48 GB VRAM.
16
-
17
- # Athene-V2-Chat-72B: Rivaling GPT-4o across Benchmarks
18
-
19
- <p align="center">
20
- <a href="https://huggingface.co/Nexusflow" target="_blank">Nexusflow HF</a> - <a href="https://discord.gg/HDSVmNAs3y" target="_blank">Nexusflow Discord</a> - <a href="https://nexusflow.ai/blogs/athene-v2" target="_blank">Athene-V2 Blogpost</a>
21
- </p>
22
-
23
-
24
- We introduce Athene-V2-Chat-72B, an open-weights LLM on-par with GPT-4o across benchmarks. It is trained through RLHF with Qwen-2.5-72B-Instruct as base model.
25
- Athene-V2-Chat-72B excels in chat, math, and coding. Its sister model, [Athene-V2-Agent-72B](https://huggingface.co/Nexusflow/Athene-V2-Agent), surpasses GPT-4o in complex function calling and agentic applications.
26
-
27
-
28
- <p align="center" width="100%">
29
- <a><img src="benchmark.png" alt="Benchmark" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
30
- </p>
31
-
32
- - **Developed by:** The Nexusflow Team
33
- - **Model type:** Chat Model
34
- - **Finetuned from model:** [Qwen 2.5 72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
35
- - **License**: [Nexusflow Research License](https://huggingface.co/Nexusflow/Athene-V2-Chat/blob/main/Nexusflow_Research_License_.pdf)
36
- - **Blog**: https://nexusflow.ai/blogs/athene-v2
37
-
38
- ## Usage
39
- Athene-V2-Chat uses the same chat template as Qwen2.5-72B-Instruct. Below is an example simple usage using the Transformers library.
40
-
41
- ```Python
42
- from transformers import AutoModelForCausalLM, AutoTokenizer
43
-
44
- model_name = "Nexusflow/Athene-V2-Chat"
45
-
46
- model = AutoModelForCausalLM.from_pretrained(
47
- model_name,
48
- torch_dtype="auto",
49
- device_map="auto"
50
- )
51
- tokenizer = AutoTokenizer.from_pretrained(model_name)
52
-
53
- prompt = "Write a Python function to return the nth Fibonacci number in log n runtime."
54
-
55
- messages = [
56
- {"role": "user", "content": prompt}
57
- ]
58
-
59
- text = tokenizer.apply_chat_template(
60
- messages,
61
- tokenize=False,
62
- add_generation_prompt=True
63
- )
64
-
65
- model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
66
-
67
- generated_ids = model.generate(
68
- **model_inputs,
69
- max_new_tokens=2048
70
- )
71
-
72
- generated_ids = [
73
- output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
74
- ]
75
-
76
- response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
77
- ```
78
-
79
- Note that by adding a system prompt that encourages the model to think step by step, the model can improve further on difficult math queries and problems like counting `r`s in strawberry. For fairness consideration we **do not** include such system prompt during chat evaluation.
80
-
81
- ## Acknowledgment
 
 
 
 
 
 
 
 
 
 
 
 
82
  We would like to thank the [LMSYS Organization](https://lmsys.org/) for their support of testing the model. We would like to thank Qwen Team and the open source community for their efforts in providing the datasets and base models.
 
1
+ ---
2
+ license: other
3
+ language:
4
+ - zho
5
+ - eng
6
+ - fra
7
+ - spa
8
+ - por
9
+ - deu
10
+ - ita
11
+ - rus
12
+ - jpn
13
+ - kor
14
+ - vie
15
+ - tha
16
+ - ara
17
+ library_name: transformers
18
+ tags:
19
+ - RLHF
20
+ - Nexusflow
21
+ - Athene
22
+ - Chat Model
23
+ base_model:
24
+ - Qwen/Qwen2.5-72B-Instruct
25
+ ---
26
+ > [!NOTE]
27
+ > EXL2 4.65bpw-h6 quantized version of [Nexusflow/Athene-V2-Chat](https://huggingface.co/Nexusflow/Athene-V2-Chat). Supports 32K context with Q4 cache on systems with 48 GB VRAM.
28
+
29
+ # Athene-V2-Chat-72B: Rivaling GPT-4o across Benchmarks
30
+
31
+ <p align="center">
32
+ <a href="https://huggingface.co/Nexusflow" target="_blank">Nexusflow HF</a> - <a href="https://discord.gg/HDSVmNAs3y" target="_blank">Nexusflow Discord</a> - <a href="https://nexusflow.ai/blogs/athene-v2" target="_blank">Athene-V2 Blogpost</a>
33
+ </p>
34
+
35
+
36
+ We introduce Athene-V2-Chat-72B, an open-weights LLM on-par with GPT-4o across benchmarks. It is trained through RLHF with Qwen-2.5-72B-Instruct as base model.
37
+ Athene-V2-Chat-72B excels in chat, math, and coding. Its sister model, [Athene-V2-Agent-72B](https://huggingface.co/Nexusflow/Athene-V2-Agent), surpasses GPT-4o in complex function calling and agentic applications.
38
+
39
+
40
+ <p align="center" width="100%">
41
+ <a><img src="benchmark.png" alt="Benchmark" style="width: 100%; min-width: 300px; display: block; margin: auto;"></a>
42
+ </p>
43
+
44
+ - **Developed by:** The Nexusflow Team
45
+ - **Model type:** Chat Model
46
+ - **Finetuned from model:** [Qwen 2.5 72B-Instruct](https://huggingface.co/Qwen/Qwen2.5-72B-Instruct)
47
+ - **License**: [Nexusflow Research License](https://huggingface.co/Nexusflow/Athene-V2-Chat/blob/main/Nexusflow_Research_License_.pdf)
48
+ - **Blog**: https://nexusflow.ai/blogs/athene-v2
49
+
50
+ ## Usage
51
+ Athene-V2-Chat uses the same chat template as Qwen2.5-72B-Instruct. Below is an example simple usage using the Transformers library.
52
+
53
+ ```Python
54
+ from transformers import AutoModelForCausalLM, AutoTokenizer
55
+
56
+ model_name = "Nexusflow/Athene-V2-Chat"
57
+
58
+ model = AutoModelForCausalLM.from_pretrained(
59
+ model_name,
60
+ torch_dtype="auto",
61
+ device_map="auto"
62
+ )
63
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
64
+
65
+ prompt = "Write a Python function to return the nth Fibonacci number in log n runtime."
66
+
67
+ messages = [
68
+ {"role": "user", "content": prompt}
69
+ ]
70
+
71
+ text = tokenizer.apply_chat_template(
72
+ messages,
73
+ tokenize=False,
74
+ add_generation_prompt=True
75
+ )
76
+
77
+ model_inputs = tokenizer([text], return_tensors="pt").to(model.device)
78
+
79
+ generated_ids = model.generate(
80
+ **model_inputs,
81
+ max_new_tokens=2048
82
+ )
83
+
84
+ generated_ids = [
85
+ output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
86
+ ]
87
+
88
+ response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]
89
+ ```
90
+
91
+ Note that by adding a system prompt that encourages the model to think step by step, the model can improve further on difficult math queries and problems like counting `r`s in strawberry. For fairness consideration we **do not** include such system prompt during chat evaluation.
92
+
93
+ ## Acknowledgment
94
  We would like to thank the [LMSYS Organization](https://lmsys.org/) for their support of testing the model. We would like to thank Qwen Team and the open source community for their efforts in providing the datasets and base models.