Text Generation
Transformers
PyTorch
RefinedWeb
falcon-40b
rlhf
falcon
custom_code
text-generation-inference
Inference Endpoints
iacolippo commited on
Commit
fb8bfcb
1 Parent(s): c62f089

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +175 -0
README.md ADDED
@@ -0,0 +1,175 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: apache-2.0
3
+ datasets:
4
+ - Anthropic/hh-rlhf
5
+ - OpenAssistant/oasst1
6
+ - databricks/databricks-dolly-15k
7
+ language:
8
+ - en
9
+ - fr
10
+ - de
11
+ - es
12
+ - it
13
+ ---
14
+ # Model Card for Alfred-40B-0723
15
+
16
+ `Alfred-40B-0723` is a finetuned version of [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b), obtained with Reinforcement Learning from Human Feedback (RLHF).
17
+ It is the first of a series of RLHF models based on Falcon-40B that will be regularly released. It is made available under the Apache 2.0 License.
18
+
19
+ ## Model Details
20
+
21
+ ### Model Description
22
+
23
+ - **Developed by:** [LightOn](https://www.lighton.ai/)
24
+ - **Model type:** Causal decoder-only;
25
+ - **Language(s) (NLP):** English, German, Spanish, French (and limited capabilities in Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish);
26
+ - **License:** Apache 2.0 license.
27
+ - **Finetuned from model:** [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b)
28
+
29
+ ## Uses
30
+
31
+ ### Direct Use
32
+
33
+ `Alfred-40B-0723` can be used as an instruct or chat model. We encourage its usage for research on large language models finetuned with RLHF as well.
34
+
35
+ ### Out-of-Scope Use
36
+
37
+ Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful.
38
+
39
+ ## Bias, Risks, and Limitations
40
+
41
+ `Alfred-40B-0723` is a finetune of Falcon-40B. As such, it is trained mostly on English, German, Spanish, French, with limited capabilities also in in Italian, Portuguese, Polish, Dutch, Romanian, Czech, Swedish. It will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.
42
+
43
+ ### Recommendations
44
+
45
+ We recommend users of `Alfred-40B-0723` to implement appropriate guardrails and precautions in any production use.
46
+
47
+ ## How to Get Started with the Model
48
+
49
+ Use the code below to get started with the model.
50
+
51
+ ```
52
+ from transformers import AutoTokenizer, AutoModelForCausalLM
53
+ import transformers
54
+ import torch
55
+
56
+ model = "lightonai/alfred-40b-0723"
57
+ tokenizer = AutoTokenizer.from_pretrained(model)
58
+
59
+ pipeline = transformers.pipeline(
60
+ "text-generation",
61
+ model=model,
62
+ tokenizer=tokenizer,
63
+ torch_dtype=torch.bfloat16,
64
+ trust_remote_code=True,
65
+ device_map="auto",
66
+ )
67
+
68
+
69
+ sequences = pipeline(
70
+ "Write a short text to announce that the new transformer model Alfred is available in open-source on Huggingface, include emojis.",
71
+ max_length=200,
72
+ do_sample=True,
73
+ top_k=10,
74
+ num_return_sequences=1,
75
+ eos_token_id=tokenizer.eos_token_id,
76
+ )
77
+ for seq in sequences:
78
+ print(f"Result: {seq['generated_text']}")
79
+ ```
80
+
81
+ ## Training Details
82
+
83
+ ### Training Data
84
+
85
+ Alfred-40B-0723 was trained on a mixture of publicly available and in-house curated datasets.
86
+
87
+ | **Data source** |
88
+ |--------------------|
89
+ | [oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) |
90
+ | [hh-rlhf](https://huggingface.co/datasets/Anthropic/hh-rlhf) |
91
+ | [dolly](https://huggingface.co/datasets/databricks/databricks-dolly-15k) |
92
+ | [NatInstV2](https://github.com/allenai/natural-instructions) |
93
+ | momentum-internal |
94
+
95
+ `momentum-internal` is a collection of prompts rated as gold quality from the staff of LightOn in their daily workflow.
96
+
97
+ ### Training Procedure
98
+
99
+ `Alfred-40B-0723` was trained on 128 A100 40GB GPUs, using a 3D parallelism strategy (TP=8, PP=4, DP=4) combined with ZeRO.
100
+
101
+ #### Preprocessing
102
+
103
+ Samples from each of the datasets have been programmatically formatted to chat, instructions and few-shot promtps.
104
+
105
+ #### Training Hyperparameters
106
+
107
+ ##### Policy and Value Optimizer Config
108
+
109
+ | **Hyperparameter** | **Value** | **Comment** |
110
+ |--------------------|------------|-------------------------------------------|
111
+ | Precision | `bfloat16` | |
112
+ | Optimizer | AdamW | |
113
+ | Learning rate | 1.85e-6 | 10 warm-up steps, cosine decay over a 100 steps to 1.85e-7 |
114
+
115
+ ##### Trainer config
116
+ | **Hyperparameter** | **Value** |
117
+ |--------------------|------------|
118
+ | Num Rollouts | 1024 |
119
+ | PPO Epochs | 1 |
120
+ | Value Epochs | 1 |
121
+ | Constant KL Coef | `true` |
122
+ | Init KL Coef | 0.01 |
123
+ | Target KL | 6.0 |
124
+ | K Beta | 0.1 |
125
+ | Gamma | 1.0 |
126
+ | GAE Lambda | 0.95 |
127
+ | Clip Range | 0.2 |
128
+ | Clip Range Value | 0.2 |
129
+ | Whiten Advantages | `true` |
130
+ | Whiten Rewards | `false` |
131
+ | Loss on EPD | `true` |
132
+ | Max Steps | 200 |
133
+ | microbatch_size | 1 |
134
+ | PPO steps/epoch | 1 |
135
+ | Value steps/epoch | 8 |
136
+
137
+ ##### Trajectory data config
138
+ | **Hyperparameter** | **Value** |
139
+ |----------------------|------------|
140
+ | Continuation Max Len | 1024 |
141
+ | Continuation Min Len | 0 |
142
+ | Top P | 1.0 |
143
+ | Temperature | 1.0 |
144
+ | # Cached Batches | 128 |
145
+ | Microbatch size | 1 |
146
+
147
+
148
+ ## Evaluation
149
+
150
+ ![aggregated evaluation of RAW vs SFT vs PPO - including random baseline - PPO suffers in arithmetic due to effects on calibration](https://i.ibb.co/9yQFJ40/aggregated.png "aggregated evaluation of RAW vs SFT vs PPO - including random baseline")
151
+
152
+ First evaluation results aggregated from the EleutherAI harness:
153
+ - Arithmetic capabilities become much worse
154
+ - Common Sense, Paraphrase, Reasoning, Reading Comprehension stay at about the same level
155
+ - NLI becomes better and QA gets worse
156
+
157
+ Overall these results were expected from the literature. Benchmarks don't really correlate with human preference.
158
+ All these metrics use a Select methodology, and it since RLHF models are far less calibrated than raw LLMs, they will be punished in these evaluations.
159
+
160
+ Human evaluation is currently ongoing.
161
+
162
+ ### Compute Infrastructure
163
+
164
+ #### Hardware
165
+
166
+ Alfred-40B-0723 was trained on AWS SageMaker, on 128 A100 40GB GPUs in P4d instances.
167
+
168
+ #### Software
169
+
170
+ Alfred-40B-0723 was trained with a custom RLHF codebase. Training leverages a 3D parallelism approach combined with ZeRO, as well as high-performance kernels such as FlashAttention.
171
+
172
+ ## Model Card Contact
173
+
174
+ contact@lighton.ai
175
+