Text Generation
PEFT
Safetensors
Eval Results
File size: 10,706 Bytes
8c13c67
41b74c9
535258f
8c13c67
 
 
41b74c9
f6d8884
45cfd25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
8c13c67
 
798a7a8
c2010e0
8c13c67
c2010e0
 
 
 
653db0e
c2010e0
 
8c13c67
 
 
29771c5
91d2fdd
0984ee8
cd230e4
c0426b3
8c13c67
c2010e0
88615c1
29771c5
8c13c67
 
 
d260db0
29771c5
 
 
 
 
 
 
 
 
 
 
 
 
8c13c67
 
 
 
 
 
 
 
 
99443cd
8c13c67
 
 
 
 
 
 
c2010e0
99443cd
a4e33fb
 
fdaf29a
 
 
a4e33fb
 
 
99443cd
8c13c67
99443cd
8c13c67
 
 
fdaf29a
8c13c67
 
 
c2010e0
99443cd
fdaf29a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
99443cd
8c13c67
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
85824ac
8c13c67
 
 
 
 
 
 
 
 
 
863f517
 
8c13c67
85824ac
 
863f517
85824ac
c2010e0
8c13c67
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
863f517
8c13c67
863f517
 
 
a4e33fb
8c13c67
 
 
 
 
 
 
 
 
 
 
 
863f517
8c13c67
863f517
 
 
 
 
8c13c67
 
 
863f517
8c13c67
 
41ff0d3
 
 
8c13c67
 
 
 
c2010e0
8c13c67
 
 
 
502c109
8c13c67
 
 
 
 
99443cd
 
 
 
 
ee78cab
 
45cfd25
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
---
license: apache-2.0
library_name: peft
datasets:
- OpenAssistant/oasst1
pipeline_tag: text-generation
base_model: tiiuae/falcon-40b
inference: false
model-index:
- name: falcon-40b-openassistant-peft
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 62.63
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dfurman/falcon-40b-openassistant-peft
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 85.59
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dfurman/falcon-40b-openassistant-peft
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 57.77
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dfurman/falcon-40b-openassistant-peft
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 51.02
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dfurman/falcon-40b-openassistant-peft
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 81.45
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dfurman/falcon-40b-openassistant-peft
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 13.34
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=dfurman/falcon-40b-openassistant-peft
      name: Open LLM Leaderboard
---


<div align="center">

<img src="./falcon.webp" width="150px">

</div>

# Falcon-40B-Chat-v0.1

Falcon-40B-Chat-v0.1 is a chatbot model for dialogue generation. It was built by fine-tuning [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) on the [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) dataset. This repo only includes the LoRA adapters from fine-tuning with 🤗's [peft](https://github.com/huggingface/peft) package. 

## Model Summary

- **Model Type:** Causal language model (clm)
- **Language(s):** English
- **Base Model:** [Falcon-40B](https://huggingface.co/tiiuae/falcon-40b) (License: [Apache 2.0](https://huggingface.co/tiiuae/falcon-40b#license))
- **Dataset:** [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) (License: [Apache 2.0](https://huggingface.co/datasets/OpenAssistant/oasst1/blob/main/LICENSE))
- **License:** Apache 2.0 inherited from "Base Model" and "Dataset"

The model was fine-tuned in 4-bit precision using `peft` adapters, `transformers`, and `bitsandbytes`. Training relied on a method called "Low Rank Adapters" ([LoRA](https://arxiv.org/pdf/2106.09685.pdf)), specifically the [QLoRA](https://arxiv.org/abs/2305.14314) variant. The run took approximately 10 hours and was executed on a workstation with a single A100-SXM NVIDIA GPU with 37 GB of available memory. See attached [Colab Notebook](https://huggingface.co/dfurman/Falcon-40B-Chat-v0.1/blob/main/finetune_falcon40b_oasst1_with_bnb_peft.ipynb) for the code and hyperparams used to train the model. 

## Model Date

May 30, 2023

## Open LLM Leaderboard Evaluation Results
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dfurman__falcon-40b-openassistant-peft)

| Metric                | Value                     |
|-----------------------|---------------------------|
| Avg.                  | 51.17   |
| ARC (25-shot)         | 62.63          |
| HellaSwag (10-shot)   | 85.59    |
| MMLU (5-shot)         | 57.77         |
| TruthfulQA (0-shot)   | 51.02   |
| Winogrande (5-shot)   | 81.45   |
| GSM8K (5-shot)        | 13.34        |
| DROP (3-shot)         | 6.36         |

## Quick Start

To prompt the chat model, use the following format:

```
<human>: [Instruction]
<bot>:
```

### Example Dialogue 1

**Prompter**:
```
"""<human>: My name is Daniel. Write a short email to my closest friends inviting them to come to my home on Friday for a dinner party, I will make the food but tell them to BYOB.
<bot>:"""
```

**Falcon-40B-Chat-v0.1**:
```
Dear Friends,

I'm excited to invite you to a dinner party at my home on Friday, [date]. I will be making the food, but please bring your own beverages.

I'm looking forward to seeing you and catching up!

Sincerely,
Daniel
```

### Example Dialogue 2

**Prompter**:
```
<human>: Create a list of things to do in San Francisco.
<bot>:
```

**Falcon-40B-Chat-v0.1**:
```
Here is a list of things to do in San Francisco:

1. Visit the Golden Gate Bridge
2. Take a cable car ride
3. Visit Alcatraz Island
4. Explore the Fisherman's Wharf
5. Visit the Exploratorium
6. Go to the San Francisco Zoo
7. Visit the California Academy of Sciences
8. Go to the de Young Museum
9. Visit the San Francisco Museum of Modern Art
10. Go to the Asian Art Museum
11. Visit the Conservatory of Flowers
12. Go to the California Historical Society
13. Visit the Walt Disney Family Museum
14. Go to the San Francisco Botanical Garden
15. Visit the San Francisco Museum of Craft and Design
16. Go to the Cartoon Art Museum
17. Visit the Contemporary Jewish Museum
18. Go to the Museum of the African Diaspora
19. Visit the Museum of the City of San Francisco
```

### Direct Use

This model has been finetuned on conversation trees from [OpenAssistant/oasst1](https://huggingface.co/datasets/OpenAssistant/oasst1) and should only be used on data of a similar nature.

### Out-of-Scope Use

Production use without adequate assessment of risks and mitigation; any use cases which may be considered irresponsible or harmful. 

## Bias, Risks, and Limitations

This model is mostly trained on English data, and will not generalize appropriately to other languages. Furthermore, as it is trained on a large-scale corpora representative of the web, it will carry the stereotypes and biases commonly encountered online.

### Recommendations

We recommend users of this model to develop guardrails and to take appropriate precautions for any production use.

## How to Get Started with the Model

### Setup
```python
# Install packages
!pip install -q -U bitsandbytes loralib einops
!pip install -q -U git+https://github.com/huggingface/transformers.git 
!pip install -q -U git+https://github.com/huggingface/peft.git
!pip install -q -U git+https://github.com/huggingface/accelerate.git
```

### GPU Inference in 4-bit

This requires a GPU with at least 27GB memory.

### First, Load the Model

```python
import torch
from peft import PeftModel, PeftConfig
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig

peft_model_id = "dfurman/Falcon-40B-Chat-v0.1"
config = PeftConfig.from_pretrained(peft_model_id)

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

model = AutoModelForCausalLM.from_pretrained(
    config.base_model_name_or_path,
    return_dict=True,
    quantization_config=bnb_config,
    device_map={"":0},
    trust_remote_code=True,
)

tokenizer = AutoTokenizer.from_pretrained(config.base_model_name_or_path)
tokenizer.pad_token = tokenizer.eos_token

model = PeftModel.from_pretrained(model, peft_model_id)
```

### Next, Run the Model

```python
prompt = """<human>: My name is Daniel. Write a short email to my closest friends inviting them to come to my home on Friday for a dinner party, I will make the food but tell them to BYOB.
<bot>:"""

batch = tokenizer(
    prompt,
    padding=True,
    truncation=True,
    return_tensors='pt'
)
batch = batch.to('cuda:0')

with torch.cuda.amp.autocast():
    output_tokens = model.generate(
        inputs=batch.input_ids, 
        max_new_tokens=200,
        do_sample=False,
        use_cache=True,
        temperature=1.0,
        top_k=50,
        top_p=1.0,
        num_return_sequences=1,
        pad_token_id=tokenizer.eos_token_id,
        eos_token_id=tokenizer.eos_token_id,
        bos_token_id=tokenizer.eos_token_id,
    )

generated_text = tokenizer.decode(output_tokens[0], skip_special_tokens=True)
# Inspect message response in the outputs
print(generated_text.split("<human>: ")[1].split("<bot>: ")[-1])
```

## Reproducibility

See attached [Colab Notebook](https://huggingface.co/dfurman/Falcon-40B-Chat-v0.1/blob/main/finetune_falcon40b_oasst1_with_bnb_peft.ipynb) for the code (and hyperparams) used to train the model. 

### CUDA Info

- CUDA Version: 12.0
- Hardware: 1 A100-SXM
- Max Memory: {0: "37GB"}
- Device Map: {"": 0}

### Package Versions Employed

- `torch`: 2.0.1+cu118
- `transformers`: 4.30.0.dev0
- `peft`: 0.4.0.dev0
- `accelerate`: 0.19.0
- `bitsandbytes`: 0.39.0
- `einops`: 0.6.1


# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_dfurman__falcon-40b-openassistant-peft)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |58.63|
|AI2 Reasoning Challenge (25-Shot)|62.63|
|HellaSwag (10-Shot)              |85.59|
|MMLU (5-Shot)                    |57.77|
|TruthfulQA (0-shot)              |51.02|
|Winogrande (5-shot)              |81.45|
|GSM8k (5-shot)                   |13.34|