Create README.md
Browse files
README.md
ADDED
@@ -0,0 +1,117 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
+
---
|
2 |
+
tags:
|
3 |
+
- generated_from_trainer
|
4 |
+
- distilbart
|
5 |
+
model-index:
|
6 |
+
- name: distilbart-finetuned-summarization
|
7 |
+
results: []
|
8 |
+
license: apache-2.0
|
9 |
+
datasets:
|
10 |
+
- cnn_dailymail
|
11 |
+
- xsum
|
12 |
+
- samsum
|
13 |
+
- ccdv/pubmed-summarization
|
14 |
+
language:
|
15 |
+
- en
|
16 |
+
metrics:
|
17 |
+
- rouge
|
18 |
+
---
|
19 |
+
|
20 |
+
<!-- This model card has been generated automatically according to the information the Trainer had access to. You
|
21 |
+
should probably proofread and complete it, then remove this comment. -->
|
22 |
+
|
23 |
+
# distilgpt2-finetuned-finance
|
24 |
+
|
25 |
+
This model is a fine-tuned version of distilgpt2 on the the combination of 4 different finance datasets:
|
26 |
+
- [cnn_dailymail](https://huggingface.co/datasets/cnn_dailymail)
|
27 |
+
- [samsum](https://huggingface.co/datasets/samsum)
|
28 |
+
- [xsum](https://huggingface.co/datasets/xsum)
|
29 |
+
- [ccdv/pubmed-summarization](https://huggingface.co/datasets/ccdv/pubmed-summarization)
|
30 |
+
|
31 |
+
## Training and evaluation data
|
32 |
+
|
33 |
+
One can reproduce the dataset using the following code:
|
34 |
+
|
35 |
+
```python
|
36 |
+
from datasets import DatasetDict, load_dataset
|
37 |
+
from datasets import concatenate_datasets
|
38 |
+
|
39 |
+
xsum_dataset = load_dataset("xsum")
|
40 |
+
pubmed_dataset = load_dataset("ccdv/pubmed-summarization").rename_column("article", "document").rename_column("abstract", "summary")
|
41 |
+
cnn_dataset = load_dataset("cnn_dailymail", '3.0.0').rename_column("article", "document").rename_column("highlights", "summary")
|
42 |
+
samsum_dataset = load_dataset("samsum").rename_column("dialogue", "document")
|
43 |
+
|
44 |
+
summary_train = concatenate_datasets([xsum_dataset["train"], pubmed_dataset["train"], cnn_dataset["train"], samsum_dataset["train"]])
|
45 |
+
summary_validation = concatenate_datasets([xsum_dataset["validation"], pubmed_dataset["validation"], cnn_dataset["validation"], samsum_dataset["validation"]])
|
46 |
+
summary_test = concatenate_datasets([xsum_dataset["test"], pubmed_dataset["test"], cnn_dataset["test"], samsum_dataset["test"]])
|
47 |
+
|
48 |
+
raw_datasets = DatasetDict()
|
49 |
+
raw_datasets["train"] = summary_train
|
50 |
+
raw_datasets["validation"] = summary_validation
|
51 |
+
raw_datasets["test"] = summary_test
|
52 |
+
|
53 |
+
```
|
54 |
+
|
55 |
+
## Inference example
|
56 |
+
|
57 |
+
```python
|
58 |
+
from transformers import pipeline
|
59 |
+
|
60 |
+
pipe = pipeline("text2text-generation", model="sshleifer/distilbart-cnn-12-6")
|
61 |
+
|
62 |
+
text = """The tower is 324 metres (1,063 ft) tall, about the same height as
|
63 |
+
an 81-storey building, and the tallest structure in Paris. Its base is square,
|
64 |
+
measuring 125 metres (410 ft) on each side. During its construction, the
|
65 |
+
Eiffel Tower surpassed the Washington Monument to become the tallest man-made
|
66 |
+
structure in the world, a title it held for 41 years until the Chrysler Building
|
67 |
+
in New York City was finished in 1930. It was the first structure to reach a
|
68 |
+
height of 300 metres. Due to the addition of a broadcasting aerial at the top
|
69 |
+
of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres
|
70 |
+
(17 ft). Excluding transmitters, the Eiffel Tower is the second tallest
|
71 |
+
free-standing structure in France after the Millau Viaduct.
|
72 |
+
"""
|
73 |
+
|
74 |
+
>>>"""The Eiffel Tower is the tallest man-made structure in the world .
|
75 |
+
The tower is 324 metres tall, about the same height as an 81-storey building .
|
76 |
+
Due to the addition of a broadcasting aerial in 1957, it is now taller than
|
77 |
+
the Chrysler Building by 5.2 metres .
|
78 |
+
"""
|
79 |
+
```
|
80 |
+
|
81 |
+
## Training procedure
|
82 |
+
|
83 |
+
Notebook link: [here](https://github.com/LxYuan0420/nlp/blob/main/notebooks/distilbart-finetune-summarisation.ipynb)
|
84 |
+
|
85 |
+
### Training hyperparameters
|
86 |
+
|
87 |
+
The following hyperparameters were used during training:
|
88 |
+
- evaluation_strategy="epoch",
|
89 |
+
- save_strategy="epoch",
|
90 |
+
- logging_strategy="epoch",
|
91 |
+
- learning_rate=2e-5,
|
92 |
+
- per_device_train_batch_size=2,
|
93 |
+
- per_device_eval_batch_size=2,
|
94 |
+
- gradient_accumulation_steps=64,
|
95 |
+
- total_train_batch_size: 128
|
96 |
+
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
|
97 |
+
- lr_scheduler_type: linear
|
98 |
+
- weight_decay=0.01,
|
99 |
+
- save_total_limit=2,
|
100 |
+
- num_train_epochs=10,
|
101 |
+
- predict_with_generate=True,
|
102 |
+
- fp16=True,
|
103 |
+
- push_to_hub=True
|
104 |
+
|
105 |
+
### Training results
|
106 |
+
_Training is still in progress_
|
107 |
+
|
108 |
+
| Epoch | Training Loss | Validation Loss | Rouge1 | Rouge2 | RougeL | RougeLsum | Gen Len |
|
109 |
+
|-------|---------------|-----------------|--------|--------|--------|-----------|---------|
|
110 |
+
| 0 | 1.779700 | 1.719054 | 40.0039| 17.9071| 27.8825| 34.8886 | 88.8936 |
|
111 |
+
|
112 |
+
### Framework versions
|
113 |
+
|
114 |
+
- Transformers 4.30.2
|
115 |
+
- Pytorch 2.0.1+cu117
|
116 |
+
- Datasets 2.13.1
|
117 |
+
- Tokenizers 0.13.3
|