Text Generation
Transformers
PyTorch
RefinedWeb
falcon-40b
rlhf
falcon
custom_code
text-generation-inference
Inference Endpoints
iacolippo commited on
Commit
c22a863
1 Parent(s): c62f089

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +199 -0
README.md ADDED
@@ -0,0 +1,199 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Anthropic/hh-rlhf
5
+ - OpenAssistant/oasst1
6
+ - databricks/databricks-dolly-15k
7
+ language:
8
+ - en
9
+ - fr
10
+ - de
11
+ - es
12
+ - it
13
+ ---
14
+ # Model Card for Alfred-40B-0723
15
+
16
+ ![a witty and elegant butler with a falcon on his shoulder, smile, flat illustration, simple shapes, colorful, lo-fi aesthetics](https://i.ibb.co/B4hds3G/alfred-mini.png)
17
+
18
+ `Alfred-40B-0723` is a finetuned version of [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b), obtained with Reinforcement Learning from Human Feedback (RLHF).
19
+ Finetuning was performed in July 2023. It is the first of a series of RLHF models based on Falcon-40B that will be regularly released. It is made available under the Apache 2.0 License.
20
+
21
+ ## Model Details
22
+
23
+ ### Model Description
24
+
25
+ - **Developed by:** [LightOn](https://www.lighton.ai/) - [Axel Marmet](https://huggingface.co/WeightsnWizardry) (lead), [Oskar Hallstrom](https://huggingface.co/ohallstrom) (reward models), [Clement Thiriet](https://huggingface.co/cthiriet) (data infrastructure), [Julien Seailles](https://huggingface.co/Jseailleslighton), [Othman Hicheur](https://huggingface.co/othmanlighton), [Amelie Chatelain](https://huggingface.co/ameliechatelain) (data collection)
26
+ - **Model type:** Causal decoder-only;
27
+ - **Language(s) (NLP):** English, German, Spanish, French (and limited capabilities in Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish);
28
+ - **License:** Apache 2.0 license.
29
+ - **Finetuned from model:** [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b)
30
+ - **Training date:** July 2023 (`0723`).
31
+
32
+ ## Uses
33
+
34
+ ### Direct Use
35
+
36
+ `Alfred-40B-0723` can be used as an instruct or chat model. We encourage its usage for research on large language models finetuned with RLHF as well.
37
+
38
+ The prefix to use Alfred in chat mode is:
39
+
40
+ ```
41
+ Alfred is a large language model trained by LightOn. Knowledge cutoff: November 2022. Current date: 31 July, 2023
42
+
43
+ User: {user query}
44
+ Alfred:
45
+ ```
46
+
47
+ The stop word `User:` should be used.
48
+
49
+ ### Out-of-Scope Use
50
+
51
+ Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
52
+
53
+ ## Bias, Risks, and Limitations
54
+
55
+ `Alfred-40B-0723` is a finetune of Falcon-40B. As such, it is trained mostly on English, German, Spanish, French, with limited capabilities also in in Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish. It will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
56
+
57
+ ### Recommendations
58
+
59
+ We recommend users of `Alfred-40B-0723` to implement appropriate guardrails and precautions in any production use.
60
+
61
+ ### Observed failure modes
62
+ From internal testing, the following failure modes have been observed:
63
+ * The model has a tendency to respond in Spanish to very short prompts in English, such as shorter greetings (e.g. "Hello", "Hi");
64
+ * At times, the model encloses its response in quotes;
65
+ * A times, the model adds a sentiment in brackets to its output (e.g. "[sadly] *model response*")
66
+
67
+ These are mainly due to certain patterns prevalent in the open source datasets used, and will be adressed in future iterations of Alfred.
68
+
69
+ If you encounter any other recurring failure modes, please open a community discussion, or contact us.
70
+
71
+ ## How to Get Started with the Model
72
+
73
+ Use the code below to get started with the model.
74
+
75
+ ```
76
+ from transformers import AutoTokenizer, AutoModelForCausalLM
77
+ import transformers
78
+ import torch
79
+
80
+ model = "lightonai/alfred-40b-0723"
81
+ tokenizer = AutoTokenizer.from_pretrained(model)
82
+
83
+ pipeline = transformers.pipeline(
84
+ "text-generation",
85
+ model=model,
86
+ tokenizer=tokenizer,
87
+ torch_dtype=torch.bfloat16,
88
+ trust_remote_code=True,
89
+ device_map="auto",
90
+ )
91
+
92
+
93
+ sequences = pipeline(
94
+ "Write a short text to announce that the new transformer model Alfred is available in open-source on Huggingface, include emojis.",
95
+ max_length=200,
96
+ do_sample=True,
97
+ top_k=10,
98
+ num_return_sequences=1,
99
+ eos_token_id=tokenizer.eos_token_id,
100
+ )
101
+ for seq in sequences:
102
+ print(f"Result: {seq['generated_text']}")
103
+ ```
104
+
105
+ ## Training Details
106
+
107
+ ### Training Data
108
+
109
+ Alfred-40B-0723 was trained on a mixture of publicly available and in-house curated datasets.
110
+
111
+ | **Data source** |
112
+ |--------------------|
113
+ | [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) |
114
+ | [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) |
115
+ | [dolly](https://huggingface.co/datasets/databricks/databricks-dolly-15k) |
116
+ | [NatInstV2](https://github.com/allenai/natural-instructions) |
117
+ | momentum-internal |
118
+
119
+ `momentum-internal` is a collection of prompts rated as gold quality from the staff of LightOn in their daily workflow.
120
+
121
+ ### Training Procedure
122
+
123
+ `Alfred-40B-0723` was trained on 128 A100 40GB GPUs, using a 3D parallelism strategy (TP=8, PP=4, DP=4) combined with ZeRO. The value model is initialized from the reward model and does not have any shared parameters with the policy network.
124
+
125
+ #### Preprocessing
126
+
127
+ Samples from each of the datasets have been programmatically formatted to chat, instructions and few-shot promtps.
128
+
129
+ #### Training Hyperparameters
130
+
131
+ ##### Policy and Value Optimizer Config
132
+
133
+ | **Hyperparameter** | **Value** | **Comment** |
134
+ |--------------------|------------|-------------------------------------------|
135
+ | Precision | `bfloat16` | |
136
+ | Optimizer | AdamW | |
137
+ | Learning rate | 1.85e-6 | 10 warm-up steps, cosine decay over a 100 steps to 1.85e-7 |
138
+
139
+ ##### Trainer config
140
+ | **Hyperparameter** | **Value** |
141
+ |--------------------|------------|
142
+ | Num Rollouts | 1024 |
143
+ | Policy Epochs | 1 |
144
+ | Value Epochs | 1 |
145
+ | KL Coef | 0.01 |
146
+ | Gamma | 1.0 |
147
+ | GAE Lambda | 0.95 |
148
+ | Clip Range Policy | 0.2 |
149
+ | Clip Range Value | 0.2 |
150
+ | Whiten Advantages | `true` |
151
+ | Whiten Rewards | `false` |
152
+ | Score on EOD | `true` |
153
+ | Max Steps | 200 |
154
+ | PPO steps/epoch | 1 |
155
+ | Value steps/epoch | 8 |
156
+
157
+ ##### Trajectory data config
158
+ | **Hyperparameter** | **Value** |
159
+ |----------------------|------------|
160
+ | Continuation Max Len | 1024 |
161
+ | Continuation Min Len | 0 |
162
+ | Top P | 1.0 |
163
+ | Temperature | 1.0 |
164
+
165
+ ##### Of interest to the community
166
+ The following hyper parameters have not been extensively explored and should not be taken as a gold standard:
167
+ - learning rate
168
+ - number of rollouts
169
+ - number of epochs
170
+ - steps per epoch
171
+
172
+ ## Evaluation
173
+
174
+ ![aggregated evaluation of RAW vs SFT vs PPO - including random baseline - PPO suffers in arithmetic due to effects on calibration](https://i.ibb.co/9yQFJ40/aggregated.png "aggregated evaluation of RAW vs SFT vs PPO - including random baseline")
175
+
176
+ Initial evaluation results derived from the EleutherAI harness are as follows:
177
+
178
+ - Arithmetic capabilities exhibit a significant decline.
179
+ - Common Sense, Paraphrase, Reasoning, and Reading Comprehension remain relatively stable.
180
+ - Natural Language Inference (NLI) demonstrates improvement while Question Answering (QA) shows deterioration.
181
+
182
+ These outcomes align with existing literature expectations. It is worth noting that benchmark metrics do not necessarily align with human preferences. Moreover, all these metrics employ a Select methodology which penalizes RLHF models due to their sub-standard calibration compared to raw LLMs.
183
+
184
+ **Human evaluation is currently underway.**
185
+
186
+ ### Compute Infrastructure
187
+
188
+ #### Hardware
189
+
190
+ Alfred-40B-0723 was trained on AWS SageMaker, on 128 A100 40GB GPUs in P4d instances.
191
+
192
+ #### Software
193
+
194
+ Alfred-40B-0723 was trained with a custom RLHF codebase. Training leverages a 3D parallelism approach combined with ZeRO, as well as high-performance kernels such as FlashAttention.
195
+
196
+ ## Model Card Contact
197
+
198
+ contact@lighton.ai
199
+