license: apache-2.0
base_model: google/mt5-small
tags:
- generated_from_trainer
metrics:
- rouge
- bleu
- meteor
datasets:
- natural_questions
model-index:
- name: mt5-small
results:
- task:
type: Question answering from context
name: Question answering
dataset:
type: natural-questions
name: Adapted Natural Questions
metrics:
- type: bleu
value: 34.1596
name: BLEU
verified: true
- type: rouge
value: 44.4366
name: ROUGE1
verified: true
- type: rouge
value: 38.8202
name: ROUGE2
verified: true
- type: rouge
value: 43.113
name: ROUGEl
verified: true
- type: rouge
value: 43.1423
name: ROUGElsum
verified: true
- type: meteor
value: 0.4049
name: METEOR
verified: true
mt5-small_test_45
This model is a fine-tuned version of google/mt5-small on an enhanced version of the Natural Questions dataset. It achieves the following results on the evaluation set:
- Loss: 0.7291
- Rouge1: 44.4366
- Rouge2: 38.8202
- Rougel: 43.113
- Rougelsum: 43.1423
- Bleu: 34.1596
- Gen Len: 12.6724
- Meteor: 0.4049
- True negatives: 69.7281
- False negatives: 10.4037
- Cosine Sim: 0.763
Model description
This model is fine-tuned for long-form, closed-domain question answering - question-answering from context. It uses a heavily refined version of Google's Natural Questions dataset.
Answers to the questions were rewritten using OpenAI's GPT-3.5 Turbo model.
Please see the following repo for all code and adaptations.
Intended uses & limitations
The model requires questions to be submitted using the following format using the input message: [CONTEXT] <\s> [QUESTION]
It is trained to respond appropriately when a question cannot be answered using the provided context.
It can give false negatives and false positives on occasion (see Training Results), and all answers must be checked appropriately.
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 0.001
- train_batch_size: 16
- eval_batch_size: 16
- seed: 9
- gradient_accumulation_steps: 8
- total_train_batch_size: 128
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: linear
- num_epochs: 20
- weight_decay = 0.007
Training results
Training Loss | Epoch | Step | Validation Loss | Rouge1 | Rouge2 | Rougel | Rougelsum | Bleu | Gen Len | Meteor | True negatives | False negatives | Cosine Sim |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2.5724 | 1.0 | 175 | 0.9876 | 18.7781 | 15.6002 | 18.22 | 18.2686 | 7.6676 | 7.7661 | 0.1628 | 72.8701 | 56.677 | 0.4003 |
1.1469 | 1.99 | 350 | 0.8580 | 36.8209 | 31.2514 | 35.5008 | 35.5462 | 25.7137 | 12.0014 | 0.3311 | 62.8399 | 20.3934 | 0.6645 |
0.9468 | 2.99 | 525 | 0.7997 | 40.4128 | 34.716 | 39.0867 | 39.0972 | 29.3028 | 12.4287 | 0.3656 | 63.4441 | 15.295 | 0.7114 |
0.8129 | 3.98 | 700 | 0.7733 | 42.6764 | 36.7266 | 41.2465 | 41.2833 | 32.0644 | 12.9002 | 0.3871 | 62.1752 | 11.413 | 0.7425 |
0.7228 | 4.98 | 875 | 0.7483 | 42.9082 | 36.957 | 41.482 | 41.5233 | 32.4942 | 12.8866 | 0.3906 | 63.3233 | 11.5166 | 0.747 |
0.6493 | 5.97 | 1050 | 0.7293 | 40.3205 | 34.9632 | 39.1111 | 39.1168 | 28.8249 | 11.6867 | 0.3674 | 73.8973 | 17.9865 | 0.7068 |
0.5883 | 6.97 | 1225 | 0.7172 | 42.7342 | 37.0855 | 41.4069 | 41.424 | 32.1296 | 12.48 | 0.3887 | 70.0302 | 12.7847 | 0.7392 |
0.5409 | 7.96 | 1400 | 0.7387 | 44.6657 | 38.8426 | 43.3276 | 43.3496 | 34.4773 | 12.9395 | 0.4084 | 66.3444 | 9.5238 | 0.7658 |
0.5035 | 8.96 | 1575 | 0.7330 | 43.4925 | 38.0013 | 42.2697 | 42.2372 | 32.6131 | 12.2789 | 0.3979 | 72.6284 | 12.8364 | 0.7451 |
0.4652 | 9.95 | 1750 | 0.7291 | 44.4366 | 38.8202 | 43.113 | 43.1423 | 34.1596 | 12.6724 | 0.4049 | 69.7281 | 10.4037 | 0.763 |
Framework versions
- Transformers 4.31.0
- Pytorch 2.0.1+cu118
- Datasets 2.13.1
- Tokenizers 0.13.3