phanerozoic commited on
Commit
47b58a1
1 Parent(s): 3473b10

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +145 -3
README.md CHANGED
@@ -1,3 +1,145 @@
1
- ---
2
- license: cc-by-nc-4.0
3
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ license: cc-by-nc-4.0
3
+ language:
4
+ - en
5
+ tags:
6
+ - bart
7
+ - text-summarization
8
+ - cnn-dailymail
9
+ widget:
10
+ - text: |
11
+ The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.
12
+ example_title: Generate Summary
13
+ metrics:
14
+ - rouge
15
+ datasets:
16
+ - cnn_dailymail
17
+ model-index:
18
+ - name: BART-Large-CNN-scratch
19
+ results:
20
+ - task:
21
+ type: text-summarization
22
+ dataset:
23
+ name: CNN/DailyMail
24
+ type: cnn_dailymail
25
+ metrics:
26
+ - name: ROUGE-1
27
+ type: rouge
28
+ value: 44.07
29
+ - name: ROUGE-2
30
+ type: rouge
31
+ value: 21.06
32
+ - name: ROUGE-L
33
+ type: rouge
34
+ value: 30.65
35
+ source:
36
+ name: Internal Evaluation
37
+ url: https://huggingface.co/facebook/bart-large-cnn
38
+
39
+ ---
40
+
41
+ # BART-Large-CNN-scratch
42
+
43
+ The BART-Large-CNN-scratch model is a newly trained version of the `facebook/bart-large` model. This model was trained from scratch on the CNN/DailyMail dataset to reproduce the performance of the `facebook/bart-large-cnn` model.
44
+
45
+ - **Developed by**: phanerozoic
46
+ - **Model type**: BartForConditionalGeneration
47
+ - **Source model**: `facebook/bart-large`
48
+ - **License**: cc-by-nc-4.0
49
+ - **Languages**: English
50
+
51
+ ## Model Details
52
+
53
+ BART-Large-CNN-scratch utilizes a transformer-based architecture with a sequence-to-sequence approach, tailored specifically for text summarization tasks. This model builds upon the strengths of the original BART architecture by training from scratch using the CNN/DailyMail dataset.
54
+
55
+ ### Configuration
56
+ - **Max input length**: 1024 tokens
57
+ - **Max target length**: 128 tokens
58
+ - **Learning rate**: 4e-5
59
+ - **Batch size**: 32
60
+ - **Epochs**: 1
61
+ - **Hardware used**: NVIDIA RTX 6000 Ada Lovelace
62
+
63
+ ## Training and Evaluation Data
64
+
65
+ The model was trained on 1 epoch of the CNN/DailyMail dataset, a comprehensive collection of news articles paired with human-written summaries. This dataset is widely used as a benchmark for evaluating text summarization models due to its size and the quality of its annotations.
66
+
67
+ ## Training Procedure
68
+
69
+ The training involved starting from the `facebook/bart-large` model and training from scratch with the following settings:
70
+ - **Epochs**: 1
71
+ - **Batch size**: 32
72
+ - **Learning rate**: 4e-5
73
+ - **Training time**: 7 hours
74
+ - **Loss**: 0.65
75
+
76
+ During training, the model was optimized to reduce the loss function, enhancing its ability to generate summaries that are both concise and informative.
77
+
78
+ ### Performance
79
+ The training process resulted in the following performance metrics:
80
+ - **ROUGE-1**: 44.07
81
+ - **ROUGE-2**: 21.06
82
+ - **ROUGE-L**: 30.65
83
+
84
+ ## Comparing Performance to Base and Enhanced Models
85
+
86
+ The performance of BART-Large-CNN-scratch is compared against Facebook's base BART-large-cnn model and the enhanced version:
87
+
88
+ | Model | ROUGE-1 | ROUGE-2 | ROUGE-L |
89
+ |--------------------------------|---------|---------|---------|
90
+ | Facebook BART-large-cnn | 42.949 | 20.815 | 30.619 |
91
+ | Enhanced BART-large-cnn | 45.370 | 22.000 | 31.170 |
92
+ | BART-Large-CNN-scratch | 44.070 | 21.060 | 30.650 |
93
+
94
+ ### Analysis of ROUGE Scores
95
+
96
+ #### ROUGE-1:
97
+ - **Facebook BART-large-cnn**: 42.949
98
+ - **Enhanced BART-large-cnn**: 45.370
99
+ - **BART-Large-CNN-scratch**: 44.070
100
+
101
+ The ROUGE-1 score measures the overlap of unigrams (single words) between the generated summary and the reference summary. The BART-Large-CNN-scratch model achieved a ROUGE-1 score of 44.07, which is a significant improvement over the Facebook BART-large-cnn model (42.949) and close to the enhanced version (45.370). This indicates that the BART-Large-CNN-scratch model captures a substantial amount of relevant information from the source text.
102
+
103
+ #### ROUGE-2:
104
+ - **Facebook BART-large-cnn**: 20.815
105
+ - **Enhanced BART-large-cnn**: 22.000
106
+ - **BART-Large-CNN-scratch**: 21.060
107
+
108
+ The ROUGE-2 score measures the overlap of bigrams (pairs of consecutive words) between the generated summary and the reference summary. The BART-Large-CNN-scratch model achieved a ROUGE-2 score of 21.06, which is again an improvement over the Facebook BART-large-cnn model (20.815) and close to the enhanced version (22.000). This indicates that the BART-Large-CNN-scratch model maintains good coherence and relevance in the summaries.
109
+
110
+ #### ROUGE-L:
111
+ - **Facebook BART-large-cnn**: 30.619
112
+ - **Enhanced BART-large-cnn**: 31.170
113
+ - **BART-Large-CNN-scratch**: 30.650
114
+
115
+ The ROUGE-L score measures the longest common subsequence (LCS) between the generated summary and the reference summary. The BART-Large-CNN-scratch model achieved a ROUGE-L score of 30.65, which is slightly higher than the Facebook BART-large-cnn model (30.619) and close to the enhanced version (31.170). This suggests that the BART-Large-CNN-scratch model produces summaries that are well-structured and follow the sequence of the reference summaries closely.
116
+
117
+ ### Implications
118
+
119
+ 1. **Reproducibility**:
120
+ - The BART-Large-CNN-scratch model successfully reproduced the performance of the Facebook BART-large-cnn model. This is evidenced by the close match in ROUGE scores and identical summaries generated for the same input text. This confirms the robustness and reliability of the BART architecture and the training methodology when applied to the CNN/DailyMail dataset.
121
+
122
+ 2. **Enhanced Model Comparison**:
123
+ - The enhanced BART-large-cnn model, which was fine-tuned for an additional epoch, shows slightly better ROUGE scores compared to both the Facebook BART-large-cnn and BART-Large-CNN-scratch models. This indicates that additional fine-tuning can further improve the model's performance in capturing relevant information and generating coherent summaries.
124
+
125
+ 3. **Model Training from Scratch**:
126
+ - Training the BART-large model from scratch using the CNN/DailyMail dataset resulted in competitive performance, closely matching the pre-trained and fine-tuned models. This highlights the effectiveness of the BART architecture in learning summarization tasks from scratch, given a large and high-quality dataset.
127
+
128
+ 4. **Practical Applications**:
129
+ - The BART-Large-CNN-scratch model is highly effective for text summarization tasks in English, particularly for news articles. It can be applied in various domains such as news aggregation, content summarization, and information retrieval where concise and accurate summaries are essential.
130
+
131
+ ### Overall Appraisal
132
+
133
+ The BART-Large-CNN-scratch model demonstrates competitive performance, successfully reproducing the results of the Facebook BART-large-cnn model. It achieves significant improvements in ROUGE scores and generates high-quality summaries, making it a robust tool for text summarization applications.
134
+
135
+ ## Usage
136
+
137
+ This model is highly effective for generating summaries in English texts, particularly in contexts similar to the news articles dataset upon which the model was trained. It can be used in various applications, including news aggregation, content summarization, and information retrieval.
138
+
139
+ ## Limitations
140
+
141
+ While the model excels in contexts similar to its training data (news articles), its performance might vary on text from other domains or in other languages. Future enhancements could involve expanding the training data to include more diverse text sources, which would improve its generalizability and robustness.
142
+
143
+ ## Acknowledgments
144
+
145
+ Special thanks to the developers of the BART architecture and the Hugging Face team. Their tools and frameworks were instrumental in the development and fine-tuning of this model. The NVIDIA RTX 6000 Ada Lovelace hardware provided the necessary computational power to achieve these results.