flan-t5-large-coref / README.md
librarian-bot's picture
Librarian Bot: Add base_model information to model
898aba5
|
raw
history blame
4.23 kB
metadata
license: apache-2.0
tags:
  - generated_from_trainer
datasets:
  - winograd_wsc
metrics:
  - rouge
widget:
  - text: Sam has a Parker pen. He loves writing with it.
    example_title: Example 1
  - text: >-
      Coronavirus quickly spread worldwide in 2020. The virus mostly affects
      elderly people. They can easily catch it.
    example_title: Example 2
  - text: >-
      First, the manager evaluates the candidates. Afterwards, he notifies the
      candidates regarding the evaluation.
    example_title: Example 3
base_model: google/flan-t5-large
model-index:
  - name: flan-t5-large-coref
    results:
      - task:
          type: text2text-generation
          name: Sequence-to-sequence Language Modeling
        dataset:
          name: winograd_wsc
          type: winograd_wsc
          config: wsc285
          split: test
          args: wsc285
        metrics:
          - type: rouge
            value: 0.9495
            name: Rouge1

flan-t5-large-coref

This model is a fine-tuned version of google/flan-t5-large on the winograd_wsc dataset.

The model was trained on the task of coreference resolution.

It achieves the following results on the evaluation set:

  • Loss: 0.2404
  • Rouge1: 0.9495
  • Rouge2: 0.9107
  • Rougel: 0.9494
  • Rougelsum: 0.9494
  • Gen Len: 23.4828

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 2e-05
  • train_batch_size: 16
  • eval_batch_size: 16
  • seed: 42
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • num_epochs: 20

Training results

Training Loss Epoch Step Validation Loss Rouge1 Rouge2 Rougel Rougelsum Gen Len
1.0169 1.0 16 0.6742 0.7918 0.6875 0.7836 0.7847 18.2414
0.6275 2.0 32 0.5093 0.8776 0.7947 0.8734 0.8732 21.5517
0.596 3.0 48 0.4246 0.9104 0.8486 0.9085 0.9091 22.5172
0.743 4.0 64 0.3632 0.9247 0.8661 0.9235 0.9231 22.8621
0.5007 5.0 80 0.3301 0.9353 0.8845 0.9357 0.9353 22.8621
0.2567 6.0 96 0.3093 0.9388 0.8962 0.9392 0.9388 22.9655
0.4146 7.0 112 0.2978 0.9449 0.907 0.9455 0.9458 23.1034
0.1991 8.0 128 0.2853 0.9454 0.9064 0.946 0.9462 23.069
0.1786 9.0 144 0.2794 0.9475 0.9097 0.9475 0.9477 23.069
0.3559 10.0 160 0.2701 0.9424 0.9013 0.9428 0.9426 23.0345
0.2059 11.0 176 0.2636 0.9472 0.9069 0.9472 0.9472 23.0345
0.199 12.0 192 0.2592 0.9523 0.9141 0.9521 0.9524 23.4483
0.1634 13.0 208 0.2553 0.9523 0.9141 0.9521 0.9524 23.4483
0.2006 14.0 224 0.2518 0.9523 0.9141 0.9521 0.9524 23.4483
0.1419 15.0 240 0.2487 0.9523 0.9141 0.9521 0.9524 23.4483
0.2089 16.0 256 0.2456 0.9523 0.9141 0.9521 0.9524 23.4483
0.1007 17.0 272 0.2431 0.9523 0.9141 0.9521 0.9524 23.4483
0.1598 18.0 288 0.2415 0.9495 0.9107 0.9494 0.9494 23.4828
0.3088 19.0 304 0.2407 0.9495 0.9107 0.9494 0.9494 23.4828
0.2003 20.0 320 0.2404 0.9495 0.9107 0.9494 0.9494 23.4828

Framework versions

  • Transformers 4.25.1
  • Pytorch 1.13.0+cu116
  • Datasets 2.7.1
  • Tokenizers 0.13.2