File size: 4,377 Bytes
33bca89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
3d0cd42
 
 
33bca89
 
 
 
 
 
 
 
 
 
3d0cd42
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
33bca89
 
 
3d0cd42
 
 
 
 
 
33bca89
 
 
3d0cd42
 
 
33bca89
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
---
license: apache-2.0
base_model: google/flan-t5-large
tags:
- generated_from_trainer
metrics:
- rouge
model-index:
- name: flan-t5-large-spelling-peft
  results: []
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# flan-t5-large-spelling-peft

This model is an *experimental* peft adapter for [google/flan-t5-large](https://huggingface.co/google/flan-t5-large)
trained on the `wiki.en` dataset from [oliverguhr/spelling](https://github.com/oliverguhr/spelling).

It achieves the following results on the evaluation set:
- Loss: 0.2537
- Rouge1: 95.8905
- Rouge2: 91.9178
- Rougel: 95.8459
- Rougelsum: 95.8393
- Gen Len: 33.61

## Model description

This an experimental model that should be capable of fixing typos and punctuation.

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, pipeline

```python
model_id = "google/flan-t5-large"
peft_model_id = "jbochi/flan-t5-large-spelling-peft"

model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
model.load_adapter(peft_model_id)

tokenizer = AutoTokenizer.from_pretrained(model_id)

pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer)
pipe("Fix spelling: This restuarant is awesome")
# [{'generated_text': 'This restaurant is awesome'}]
```

## Intended uses & limitations

Intented for research purposes.

- It may produce artifacts.
- Doesn't seen capable of fixing multiple errors in a single sentence.
- It doesn't support languages other than English.
- It was fine-tuned with a `max_length` of 100 tokens.

## Training and evaluation data

Data from [oliverguhr/spelling](https://github.com/oliverguhr/spelling), with a "Fix spelling: " prefix added to every example.

The model was only evaluated on the first 100 test examples only during training.

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 64
- eval_batch_size: 64
- seed: 42
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 1

### Training results

| Training Loss | Epoch | Step | Validation Loss | Rouge1  | Rouge2  | Rougel  | Rougelsum | Gen Len |
|:-------------:|:-----:|:----:|:---------------:|:-------:|:-------:|:-------:|:---------:|:-------:|
| 0.3359        | 0.05  | 500  | 0.2738          | 95.8385 | 91.6723 | 95.7821 | 95.766    | 33.5    |
| 0.2853        | 0.11  | 1000 | 0.2702          | 95.7124 | 91.5043 | 95.656  | 95.651    | 33.53   |
| 0.2691        | 0.16  | 1500 | 0.2691          | 95.735  | 91.7108 | 95.7039 | 95.7067   | 33.41   |
| 0.2596        | 0.21  | 2000 | 0.2663          | 95.9819 | 92.0897 | 95.9519 | 95.9488   | 33.51   |
| 0.2536        | 0.27  | 2500 | 0.2621          | 95.7519 | 91.5445 | 95.6614 | 95.6622   | 33.49   |
| 0.2472        | 0.32  | 3000 | 0.2626          | 95.7052 | 91.7321 | 95.6476 | 95.6512   | 33.58   |
| 0.2448        | 0.37  | 3500 | 0.2669          | 95.8003 | 91.7949 | 95.7536 | 95.7576   | 33.57   |
| 0.2345        | 0.43  | 4000 | 0.2582          | 95.8784 | 92.008  | 95.8284 | 95.8343   | 33.65   |
| 0.2345        | 0.48  | 4500 | 0.2629          | 95.8131 | 91.9088 | 95.7624 | 95.766    | 33.63   |
| 0.2284        | 0.53  | 5000 | 0.2585          | 95.8552 | 91.9833 | 95.8105 | 95.8135   | 33.62   |
| 0.2266        | 0.59  | 5500 | 0.2591          | 95.9205 | 92.0577 | 95.8689 | 95.8718   | 33.61   |
| 0.2281        | 0.64  | 6000 | 0.2605          | 95.9172 | 91.9782 | 95.874  | 95.8638   | 33.59   |
| 0.2228        | 0.69  | 6500 | 0.2566          | 95.7612 | 91.7858 | 95.7129 | 95.7058   | 33.63   |
| 0.2202        | 0.75  | 7000 | 0.2561          | 95.9468 | 92.0914 | 95.9018 | 95.8941   | 33.64   |
| 0.218         | 0.8   | 7500 | 0.2579          | 95.9468 | 92.0914 | 95.9018 | 95.8941   | 33.64   |
| 0.2162        | 0.85  | 8000 | 0.2523          | 95.8231 | 91.9464 | 95.7727 | 95.7758   | 33.66   |
| 0.2135        | 0.91  | 8500 | 0.2549          | 95.8388 | 91.9804 | 95.7914 | 95.7917   | 33.63   |
| 0.2124        | 0.96  | 9000 | 0.2537          | 95.8905 | 91.9178 | 95.8459 | 95.8393   | 33.61   |


### Framework versions

- Transformers 4.35.2
- Pytorch 2.1.0+cu121
- Datasets 2.16.0
- Tokenizers 0.15.0