Triangle104
commited on
Update README.md
Browse files
README.md
CHANGED
@@ -16,6 +16,197 @@ tags:
|
|
16 |
This model was converted to GGUF format from [`allenai/Llama-3.1-Tulu-3-8B`](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
17 |
Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) for more details on the model.
|
18 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
19 |
## Use with llama.cpp
|
20 |
Install llama.cpp through brew (works on Mac and Linux)
|
21 |
|
|
|
16 |
This model was converted to GGUF format from [`allenai/Llama-3.1-Tulu-3-8B`](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) using llama.cpp via the ggml.ai's [GGUF-my-repo](https://huggingface.co/spaces/ggml-org/gguf-my-repo) space.
|
17 |
Refer to the [original model card](https://huggingface.co/allenai/Llama-3.1-Tulu-3-8B) for more details on the model.
|
18 |
|
19 |
+
---
|
20 |
+
Model details:
|
21 |
+
-
|
22 |
+
Tülu3 is a leading instruction following model family, offering fully
|
23 |
+
open-source data, code, and recipes designed to serve as a
|
24 |
+
comprehensive guide for modern post-training techniques.
|
25 |
+
Tülu3 is designed for state-of-the-art performance on a diversity of
|
26 |
+
tasks in addition to chat, such as MATH, GSM8K, and IFEval.
|
27 |
+
|
28 |
+
|
29 |
+
Model description
|
30 |
+
|
31 |
+
|
32 |
+
|
33 |
+
Model type: A model trained on a mix of publicly available, synthetic and human-created datasets.
|
34 |
+
Language(s) (NLP): Primarily English
|
35 |
+
License: Llama 3.1 Community License Agreement
|
36 |
+
Finetuned from model: allenai/Llama-3.1-Tulu-3-8B-DPO
|
37 |
+
|
38 |
+
|
39 |
+
Model Sources
|
40 |
+
|
41 |
+
|
42 |
+
|
43 |
+
Training Repository: https://github.com/allenai/open-instruct
|
44 |
+
Eval Repository: https://github.com/allenai/olmes
|
45 |
+
Paper: https://arxiv.org/abs/2411.15124
|
46 |
+
Demo: https://playground.allenai.org/
|
47 |
+
|
48 |
+
|
49 |
+
Using the model
|
50 |
+
|
51 |
+
|
52 |
+
Loading with HuggingFace
|
53 |
+
|
54 |
+
|
55 |
+
|
56 |
+
To load the model with HuggingFace, use the following snippet:
|
57 |
+
|
58 |
+
|
59 |
+
from transformers import AutoModelForCausalLM
|
60 |
+
|
61 |
+
|
62 |
+
tulu_model = AutoModelForCausalLM.from_pretrained("allenai/Llama-3.1-Tulu-3-8B")
|
63 |
+
|
64 |
+
|
65 |
+
VLLM
|
66 |
+
|
67 |
+
|
68 |
+
|
69 |
+
As a Llama base model, the model can be easily served with:
|
70 |
+
|
71 |
+
|
72 |
+
vllm serve allenai/Llama-3.1-Tulu-3-8B
|
73 |
+
|
74 |
+
|
75 |
+
Note that given the long chat template of Llama, you may want to use --max_model_len=8192.
|
76 |
+
|
77 |
+
|
78 |
+
Chat template
|
79 |
+
|
80 |
+
|
81 |
+
|
82 |
+
The chat template for our models is formatted as:
|
83 |
+
|
84 |
+
|
85 |
+
<|user|>\nHow are you doing?\n<|assistant|>\nI'm just a
|
86 |
+
computer program, so I don't have feelings, but I'm functioning as
|
87 |
+
expected. How can I assist you today?<|endoftext|>
|
88 |
+
|
89 |
+
|
90 |
+
Or with new lines expanded:
|
91 |
+
|
92 |
+
|
93 |
+
<|user|>
|
94 |
+
How are you doing?
|
95 |
+
<|assistant|>
|
96 |
+
I'm just a computer program, so I don't have feelings, but I'm
|
97 |
+
functioning as expected. How can I assist you today?<|endoftext|>
|
98 |
+
|
99 |
+
|
100 |
+
It is embedded within the tokenizer as well, for tokenizer.apply_chat_template.
|
101 |
+
|
102 |
+
|
103 |
+
System prompt
|
104 |
+
|
105 |
+
|
106 |
+
|
107 |
+
In Ai2 demos, we use this system prompt by default:
|
108 |
+
|
109 |
+
|
110 |
+
You are Tulu 3, a helpful and harmless AI Assistant built by the Allen Institute for AI.
|
111 |
+
|
112 |
+
|
113 |
+
The model has not been trained with a specific system prompt in mind.
|
114 |
+
|
115 |
+
|
116 |
+
Bias, Risks, and Limitations
|
117 |
+
|
118 |
+
|
119 |
+
|
120 |
+
The Tülu3 models have limited safety training, but are not deployed
|
121 |
+
automatically with in-the-loop filtering of responses like ChatGPT, so
|
122 |
+
the model can produce problematic outputs (especially when prompted to
|
123 |
+
do so).
|
124 |
+
It is also unknown what the size and composition of the corpus was used
|
125 |
+
to train the base Llama 3.1 models, however it is likely to have
|
126 |
+
included a mix of Web data and technical sources like books and code.
|
127 |
+
See the Falcon 180B model card for an example of this.
|
128 |
+
|
129 |
+
|
130 |
+
Hyperparamters
|
131 |
+
|
132 |
+
|
133 |
+
PPO settings for RLVR:
|
134 |
+
|
135 |
+
|
136 |
+
Learning Rate: 3 × 10⁻⁷
|
137 |
+
Discount Factor (gamma): 1.0
|
138 |
+
General Advantage Estimation (lambda): 0.95
|
139 |
+
Mini-batches (N_mb): 1
|
140 |
+
PPO Update Iterations (K): 4
|
141 |
+
PPO's Clipping Coefficient (epsilon): 0.2
|
142 |
+
Value Function Coefficient (c1): 0.1
|
143 |
+
Gradient Norm Threshold: 1.0
|
144 |
+
Learning Rate Schedule: Linear
|
145 |
+
Generation Temperature: 1.0
|
146 |
+
Batch Size (effective): 512
|
147 |
+
Max Token Length: 2,048
|
148 |
+
Max Prompt Token Length: 2,048
|
149 |
+
Penalty Reward Value for Responses without an EOS Token: -10.0
|
150 |
+
Response Length: 1,024 (but 2,048 for MATH)
|
151 |
+
Total Episodes: 100,000
|
152 |
+
KL penalty coefficient (beta): [0.1, 0.05, 0.03, 0.01]
|
153 |
+
Warm up ratio (omega): 0.0
|
154 |
+
|
155 |
+
|
156 |
+
License and use
|
157 |
+
|
158 |
+
|
159 |
+
|
160 |
+
All Llama 3.1 Tülu3 models are released under Meta's Llama 3.1 Community License Agreement.
|
161 |
+
Llama 3.1 is licensed under the Llama 3.1 Community License, Copyright © Meta Platforms, Inc.
|
162 |
+
Tülu3 is intended for research and educational use.
|
163 |
+
For more information, please see our Responsible Use Guidelines.
|
164 |
+
|
165 |
+
|
166 |
+
The models have been fine-tuned using a dataset mix with outputs
|
167 |
+
generated from third party models and are subject to additional terms:
|
168 |
+
Gemma Terms of Use and Qwen License Agreement (models were improved using Qwen 2.5).
|
169 |
+
|
170 |
+
|
171 |
+
Citation
|
172 |
+
|
173 |
+
|
174 |
+
|
175 |
+
If Tülu3 or any of the related materials were helpful to your work, please cite:
|
176 |
+
|
177 |
+
|
178 |
+
@article{lambert2024tulu3,
|
179 |
+
title = {Tülu 3: Pushing Frontiers in Open Language Model Post-Training},
|
180 |
+
author = {
|
181 |
+
Nathan Lambert and
|
182 |
+
Jacob Morrison and
|
183 |
+
Valentina Pyatkin and
|
184 |
+
Shengyi Huang and
|
185 |
+
Hamish Ivison and
|
186 |
+
Faeze Brahman and
|
187 |
+
Lester James V. Miranda and
|
188 |
+
Alisa Liu and
|
189 |
+
Nouha Dziri and
|
190 |
+
Shane Lyu and
|
191 |
+
Yuling Gu and
|
192 |
+
Saumya Malik and
|
193 |
+
Victoria Graf and
|
194 |
+
Jena D. Hwang and
|
195 |
+
Jiangjiang Yang and
|
196 |
+
Ronan Le Bras and
|
197 |
+
Oyvind Tafjord and
|
198 |
+
Chris Wilhelm and
|
199 |
+
Luca Soldaini and
|
200 |
+
Noah A. Smith and
|
201 |
+
Yizhong Wang and
|
202 |
+
Pradeep Dasigi and
|
203 |
+
Hannaneh Hajishirzi
|
204 |
+
},
|
205 |
+
year = {2024},
|
206 |
+
email = {tulu@allenai.org}
|
207 |
+
}
|
208 |
+
|
209 |
+
---
|
210 |
## Use with llama.cpp
|
211 |
Install llama.cpp through brew (works on Mac and Linux)
|
212 |
|