|
--- |
|
license: apache-2.0 |
|
base_model: google/flan-t5-large |
|
tags: |
|
- generated_from_trainer |
|
metrics: |
|
- rouge |
|
model-index: |
|
- name: flan-t5-large-spelling-peft |
|
results: [] |
|
--- |
|
|
|
<!-- This model card has been generated automatically according to the information the Trainer had access to. You |
|
should probably proofread and complete it, then remove this comment. --> |
|
|
|
# flan-t5-large-spelling-peft |
|
|
|
This model is an *experimental* peft adapter for [google/flan-t5-large](https://huggingface.co/google/flan-t5-large) |
|
trained on the `wiki.en` dataset from [oliverguhr/spelling](https://github.com/oliverguhr/spelling). |
|
|
|
It achieves the following results on the evaluation set: |
|
- Loss: 0.2537 |
|
- Rouge1: 95.8905 |
|
- Rouge2: 91.9178 |
|
- Rougel: 95.8459 |
|
- Rougelsum: 95.8393 |
|
- Gen Len: 33.61 |
|
|
|
## Model description |
|
|
|
This an experimental model that should be capable of fixing typos and punctuation. |
|
|
|
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, pipeline |
|
|
|
```python |
|
model_id = "google/flan-t5-large" |
|
peft_model_id = "jbochi/flan-t5-large-spelling-peft" |
|
|
|
model = AutoModelForSeq2SeqLM.from_pretrained(model_id) |
|
model.load_adapter(peft_model_id) |
|
|
|
tokenizer = AutoTokenizer.from_pretrained(model_id) |
|
|
|
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer) |
|
pipe("Fix spelling: This restuarant is awesome") |
|
# [{'generated_text': 'This restaurant is awesome'}] |
|
``` |
|
|
|
## Intended uses & limitations |
|
|
|
Intented for research purposes. |
|
|
|
- It may produce artifacts. |
|
- Doesn't seen capable of fixing multiple errors in a single sentence. |
|
- It doesn't support languages other than English. |
|
- It was fine-tuned with a `max_length` of 100 tokens. |
|
|
|
## Training and evaluation data |
|
|
|
Data from [oliverguhr/spelling](https://github.com/oliverguhr/spelling), with a "Fix spelling: " prefix added to every example. |
|
|
|
The model was only evaluated on the first 100 test examples only during training. |
|
|
|
## Training procedure |
|
|
|
### Training hyperparameters |
|
|
|
The following hyperparameters were used during training: |
|
- learning_rate: 0.001 |
|
- train_batch_size: 64 |
|
- eval_batch_size: 64 |
|
- seed: 42 |
|
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 |
|
- lr_scheduler_type: linear |
|
- num_epochs: 1 |
|
|
|
### Training results |
|
|
|
| Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Gen Len | |
|
|:-------------:|:-----:|:----:|:---------------:|:-------:|:-------:|:-------:|:---------:|:-------:| |
|
| 0.3359 | 0.05 | 500 | 0.2738 | 95.8385 | 91.6723 | 95.7821 | 95.766 | 33.5 | |
|
| 0.2853 | 0.11 | 1000 | 0.2702 | 95.7124 | 91.5043 | 95.656 | 95.651 | 33.53 | |
|
| 0.2691 | 0.16 | 1500 | 0.2691 | 95.735 | 91.7108 | 95.7039 | 95.7067 | 33.41 | |
|
| 0.2596 | 0.21 | 2000 | 0.2663 | 95.9819 | 92.0897 | 95.9519 | 95.9488 | 33.51 | |
|
| 0.2536 | 0.27 | 2500 | 0.2621 | 95.7519 | 91.5445 | 95.6614 | 95.6622 | 33.49 | |
|
| 0.2472 | 0.32 | 3000 | 0.2626 | 95.7052 | 91.7321 | 95.6476 | 95.6512 | 33.58 | |
|
| 0.2448 | 0.37 | 3500 | 0.2669 | 95.8003 | 91.7949 | 95.7536 | 95.7576 | 33.57 | |
|
| 0.2345 | 0.43 | 4000 | 0.2582 | 95.8784 | 92.008 | 95.8284 | 95.8343 | 33.65 | |
|
| 0.2345 | 0.48 | 4500 | 0.2629 | 95.8131 | 91.9088 | 95.7624 | 95.766 | 33.63 | |
|
| 0.2284 | 0.53 | 5000 | 0.2585 | 95.8552 | 91.9833 | 95.8105 | 95.8135 | 33.62 | |
|
| 0.2266 | 0.59 | 5500 | 0.2591 | 95.9205 | 92.0577 | 95.8689 | 95.8718 | 33.61 | |
|
| 0.2281 | 0.64 | 6000 | 0.2605 | 95.9172 | 91.9782 | 95.874 | 95.8638 | 33.59 | |
|
| 0.2228 | 0.69 | 6500 | 0.2566 | 95.7612 | 91.7858 | 95.7129 | 95.7058 | 33.63 | |
|
| 0.2202 | 0.75 | 7000 | 0.2561 | 95.9468 | 92.0914 | 95.9018 | 95.8941 | 33.64 | |
|
| 0.218 | 0.8 | 7500 | 0.2579 | 95.9468 | 92.0914 | 95.9018 | 95.8941 | 33.64 | |
|
| 0.2162 | 0.85 | 8000 | 0.2523 | 95.8231 | 91.9464 | 95.7727 | 95.7758 | 33.66 | |
|
| 0.2135 | 0.91 | 8500 | 0.2549 | 95.8388 | 91.9804 | 95.7914 | 95.7917 | 33.63 | |
|
| 0.2124 | 0.96 | 9000 | 0.2537 | 95.8905 | 91.9178 | 95.8459 | 95.8393 | 33.61 | |
|
|
|
|
|
### Framework versions |
|
|
|
- Transformers 4.35.2 |
|
- Pytorch 2.1.0+cu121 |
|
- Datasets 2.16.0 |
|
- Tokenizers 0.15.0 |
|
|