lxyuan commited on
Commit
e2d19aa
1 Parent(s): 5ffffa3

update model card README.md

Browse files
Files changed (1) hide show
  1. README.md +15 -90
README.md CHANGED
@@ -1,123 +1,48 @@
1
  ---
2
  tags:
3
  - generated_from_trainer
4
- - distilbart
5
  model-index:
6
  - name: distilbart-finetuned-summarization
7
  results: []
8
- license: apache-2.0
9
- datasets:
10
- - cnn_dailymail
11
- - xsum
12
- - samsum
13
- - ccdv/pubmed-summarization
14
- language:
15
- - en
16
- metrics:
17
- - rouge
18
  ---
19
 
20
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
21
  should probably proofread and complete it, then remove this comment. -->
22
 
23
- # distilgpt2-finetuned-finance
24
 
25
- This model is a further fine-tuned version of [distilbart-cnn-12-6](https://huggingface.co/sshleifer/distilbart-cnn-12-6) on the the combination of 4 different summarisation datasets:
26
- - [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail)
27
- - [samsum](https://huggingface.co/datasets/samsum)
28
- - [xsum](https://huggingface.co/datasets/xsum)
29
- - [ccdv/pubmed-summarization](https://huggingface.co/datasets/ccdv/pubmed-summarization)
30
 
31
- Please check out the offical model page and paper:
32
- - [sshleifer/distilbart-cnn-12-6](https://huggingface.co/sshleifer/distilbart-cnn-12-6)
33
- - [Pre-trained Summarization Distillation](https://arxiv.org/abs/2010.13002)
34
 
35
- ## Training and evaluation data
36
-
37
- One can reproduce the dataset using the following code:
38
-
39
- ```python
40
- from datasets import DatasetDict, load_dataset
41
- from datasets import concatenate_datasets
42
-
43
- xsum_dataset = load_dataset("xsum")
44
- pubmed_dataset = load_dataset("ccdv/pubmed-summarization").rename_column("article", "document").rename_column("abstract", "summary")
45
- cnn_dataset = load_dataset("cnn_dailymail", '3.0.0').rename_column("article", "document").rename_column("highlights", "summary")
46
- samsum_dataset = load_dataset("samsum").rename_column("dialogue", "document")
47
-
48
- summary_train = concatenate_datasets([xsum_dataset["train"], pubmed_dataset["train"], cnn_dataset["train"], samsum_dataset["train"]])
49
- summary_validation = concatenate_datasets([xsum_dataset["validation"], pubmed_dataset["validation"], cnn_dataset["validation"], samsum_dataset["validation"]])
50
- summary_test = concatenate_datasets([xsum_dataset["test"], pubmed_dataset["test"], cnn_dataset["test"], samsum_dataset["test"]])
51
-
52
- raw_datasets = DatasetDict()
53
- raw_datasets["train"] = summary_train
54
- raw_datasets["validation"] = summary_validation
55
- raw_datasets["test"] = summary_test
56
-
57
- ```
58
 
59
- ## Inference example
60
 
61
- ```python
62
- from transformers import pipeline
63
 
64
- pipe = pipeline("text2text-generation", model="lxyuan/distilbart-finetuned-summarization")
65
-
66
- text = """The tower is 324 metres (1,063 ft) tall, about the same height as
67
- an 81-storey building, and the tallest structure in Paris. Its base is square,
68
- measuring 125 metres (410 ft) on each side. During its construction, the
69
- Eiffel Tower surpassed the Washington Monument to become the tallest man-made
70
- structure in the world, a title it held for 41 years until the Chrysler Building
71
- in New York City was finished in 1930. It was the first structure to reach a
72
- height of 300 metres. Due to the addition of a broadcasting aerial at the top
73
- of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres
74
- (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest
75
- free-standing structure in France after the Millau Viaduct.
76
- """
77
-
78
- pipe(text)
79
 
80
- >>>"""The Eiffel Tower is the tallest man-made structure in the world .
81
- The tower is 324 metres tall, about the same height as an 81-storey building .
82
- Due to the addition of a broadcasting aerial in 1957, it is now taller than
83
- the Chrysler Building by 5.2 metres .
84
- """
85
- ```
86
 
87
  ## Training procedure
88
 
89
- Notebook link: [here](https://github.com/LxYuan0420/nlp/blob/main/notebooks/distilbart-finetune-summarisation.ipynb)
90
-
91
  ### Training hyperparameters
92
 
93
  The following hyperparameters were used during training:
94
- - evaluation_strategy="epoch",
95
- - save_strategy="epoch",
96
- - logging_strategy="epoch",
97
- - learning_rate=2e-5,
98
- - per_device_train_batch_size=2,
99
- - per_device_eval_batch_size=2,
100
- - gradient_accumulation_steps=64,
101
  - total_train_batch_size: 128
102
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
103
  - lr_scheduler_type: linear
104
- - weight_decay=0.01,
105
- - save_total_limit=2,
106
- - num_train_epochs=10,
107
- - predict_with_generate=True,
108
- - fp16=True,
109
- - push_to_hub=True
110
-
111
- ### Training results
112
- _Training is still in progress_
113
-
114
- | Epoch | Training Loss | Validation Loss | Rouge1 | Rouge2 | RougeL | RougeLsum | Gen Len |
115
- |-------|---------------|-----------------|--------|--------|--------|-----------|---------|
116
- | 0 | 1.779700 | 1.719054 | 40.0039| 17.9071| 27.8825| 34.8886 | 88.8936 |
117
 
118
  ### Framework versions
119
 
120
  - Transformers 4.30.2
121
  - Pytorch 2.0.1+cu117
122
  - Datasets 2.13.1
123
- - Tokenizers 0.13.3
 
1
  ---
2
  tags:
3
  - generated_from_trainer
 
4
  model-index:
5
  - name: distilbart-finetuned-summarization
6
  results: []
 
 
 
 
 
 
 
 
 
 
7
  ---
8
 
9
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
10
  should probably proofread and complete it, then remove this comment. -->
11
 
12
+ # distilbart-finetuned-summarization
13
 
14
+ This model was trained from scratch on the None dataset.
 
 
 
 
15
 
16
+ ## Model description
 
 
17
 
18
+ More information needed
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
19
 
20
+ ## Intended uses & limitations
21
 
22
+ More information needed
 
23
 
24
+ ## Training and evaluation data
 
 
 
 
 
 
 
 
 
 
 
 
 
 
25
 
26
+ More information needed
 
 
 
 
 
27
 
28
  ## Training procedure
29
 
 
 
30
  ### Training hyperparameters
31
 
32
  The following hyperparameters were used during training:
33
+ - learning_rate: 2e-05
34
+ - train_batch_size: 2
35
+ - eval_batch_size: 2
36
+ - seed: 42
37
+ - gradient_accumulation_steps: 64
 
 
38
  - total_train_batch_size: 128
39
  - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
40
  - lr_scheduler_type: linear
41
+ - num_epochs: 5
 
 
 
 
 
 
 
 
 
 
 
 
42
 
43
  ### Framework versions
44
 
45
  - Transformers 4.30.2
46
  - Pytorch 2.0.1+cu117
47
  - Datasets 2.13.1
48
+ - Tokenizers 0.13.3