File size: 7,702 Bytes
47b58a1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
ef324db
47b58a1
ef324db
47b58a1
 
 
 
 
 
ef324db
47b58a1
ef324db
47b58a1
ef324db
 
47b58a1
ef324db
 
47b58a1
ef324db
 
 
d10bb9b
 
47b58a1
ef324db
47b58a1
ef324db
 
47b58a1
ef324db
 
47b58a1
ef324db
 
 
 
47b58a1
ef324db
47b58a1
ef324db
 
47b58a1
ef324db
 
47b58a1
ef324db
 
47b58a1
ef324db
47b58a1
ef324db
47b58a1
 
 
ef324db
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
---
license: cc-by-nc-4.0
language:
- en
tags:
- bart
- text-summarization
- cnn-dailymail
widget:
  - text: |
      The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building, and the tallest structure in Paris. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world, a title it held for 41 years until the Chrysler Building in New York City was finished in 1930. It was the first structure to reach a height of 300 metres. Due to the addition of a broadcasting aerial at the top of the tower in 1957, it is now taller than the Chrysler Building by 5.2 metres (17 ft). Excluding transmitters, the Eiffel Tower is the second tallest free-standing structure in France after the Millau Viaduct.
    example_title: Generate Summary
metrics:
- rouge
datasets:
- cnn_dailymail
model-index:
  - name: BART-Large-CNN-scratch
    results:
      - task:
          type: text-summarization
        dataset:
          name: CNN/DailyMail
          type: cnn_dailymail
        metrics:
          - name: ROUGE-1
            type: rouge
            value: 44.07
          - name: ROUGE-2
            type: rouge
            value: 21.06
          - name: ROUGE-L
            type: rouge
            value: 30.65
        source:
          name: Internal Evaluation
          url: https://huggingface.co/facebook/bart-large-cnn

---

# BART-Large-CNN-scratch

The BART-Large-CNN-scratch model is a newly trained version of the `facebook/bart-large` model. This model was trained from scratch on the CNN/DailyMail dataset to reproduce the performance of the `facebook/bart-large-cnn` model.

- **Developed by**: phanerozoic
- **Model type**: BartForConditionalGeneration
- **Source model**: `facebook/bart-large`
- **License**: cc-by-nc-4.0
- **Languages**: English

## Model Details

BART-Large-CNN-scratch utilizes a transformer-based architecture with a sequence-to-sequence approach, tailored specifically for text summarization tasks. This model builds upon the strengths of the original BART architecture by training from scratch using the CNN/DailyMail dataset.

### Configuration
- **Max input length**: 1024 tokens
- **Max target length**: 128 tokens
- **Learning rate**: 4e-5
- **Batch size**: 32
- **Epochs**: 1
- **Hardware used**: NVIDIA RTX 6000 Ada Lovelace

## Training and Evaluation Data

The model was trained on 1 epoch of the CNN/DailyMail dataset, a comprehensive collection of news articles paired with human-written summaries. This dataset is widely used as a benchmark for evaluating text summarization models due to its size and the quality of its annotations.

## Training Procedure

The training involved starting from the `facebook/bart-large` model and training from scratch with the following settings:
- **Epochs**: 1
- **Batch size**: 32
- **Learning rate**: 4e-5
- **Training time**: 7 hours
- **Loss**: 0.65

During training, the model was optimized to reduce the loss function, enhancing its ability to generate summaries that are both concise and informative.

### Performance
The training process resulted in the following performance metrics:
- **ROUGE-1**: 44.07
- **ROUGE-2**: 21.06
- **ROUGE-L**: 30.65

## Comparing Performance to Base Model

The performance of BART-Large-CNN-scratch is compared against Facebook's base BART-large-cnn model:

| Model                          | ROUGE-1 | ROUGE-2 | ROUGE-L |
|--------------------------------|---------|---------|---------|
| Facebook BART-large-cnn        | 42.949  | 20.815  | 30.619  |
| BART-Large-CNN-scratch         | 44.070  | 21.060  | 30.650  |

### Analysis of Summaries

#### Eiffel Tower Article Summary Comparison

##### Facebook BART-Large-CNN Summary:
"The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building. Its base is square, measuring 125 metres (410 ft) on each side. During its construction, the Eiffel Tower surpassed the Washington Monument to become the tallest man-made structure in the world."

##### BART-Large-CNN-scratch Summary:
"The tower is 324 metres (1,063 ft) tall, about the same height as an 81-storey building. Its base is square, measuring 125 metres (410 ft) on each side. It is the second tallest free-standing structure in France after the Millau Viaduct."

- **Comparison**:
  - Both summaries start with identical descriptions of the Eiffel Tower's height and base dimensions.
  - The Facebook summary mentions the historical significance of the Eiffel Tower surpassing the Washington Monument.
  - The scratch summary includes the detail of the tower being the second tallest free-standing structure in France, providing a different historical context.
  - The scratch summary omits the name of the tower, indicating our deficiency in attempting to perfectly replicate Facebook's performance.

#### Paper Clip Article Summary Comparison

##### Facebook BART-Large-CNN Summary:
"The earliest form of the paper clip dates back to the 13th century. The most widely recognized design is attributed to the Norwegian inventor Johan Vaaler. The design of paper clips has continued to evolve, with various shapes and sizes available on the market. During World War II, paper clips became a symbol of resistance in Norway."

##### BART-Large-CNN-scratch Summary:
"The paper clip dates back to the 13th century, when a device made of a bent metal wire was used to hold sheets of paper together. The most widely recognized design is attributed to the Norwegian inventor Johan Vaaler, who received a patent for his paper clip design in 1899. During World War II, the paper clip became a symbol of resistance in Norway."

- **Comparison**:
  - Both summaries start with descriptions of the origins of the paper clip and Johan Vaaler's contributions.
  - The Facebook summary briefly mentions the evolution of paper clip designs and their availability in various shapes and sizes.
  - The scratch summary includes additional historical details about the use of bent metal wires in the 13th century and Vaaler's patent, providing a richer historical context.

### Implications

1. **Reproducibility**:
   - The BART-Large-CNN-scratch model closely reproduces the performance of the Facebook BART-large-cnn model, capturing key historical points and providing concise summaries. However, it shows some differences in detail prioritization, indicating that while the reproduction is effective, it is not exact.

2. **Model Training from Scratch**:
   - Training from scratch has proven to be effective, with the BART-Large-CNN-scratch model achieving competitive ROUGE scores. However, the summaries differ in detail compared to the Facebook model, suggesting areas for further fine-tuning.

3. **Practical Applications**:
   - Both models are effective for summarizing historical and technical articles. The BART-Large-CNN-scratch model is excellent for concise overviews, while the Facebook model provides more comprehensive context.

### Conclusion

The BART-Large-CNN-scratch model demonstrates strong performance, capturing essential historical points and providing concise summaries. While it does not exactly reproduce the Facebook model's summaries, it achieves similar quality and even exceeds in ROUGE scores. This makes it a robust tool for text summarization applications.

## Acknowledgments

Special thanks to the developers of the BART architecture and the Hugging Face team. Their tools and frameworks were instrumental in the development and fine-tuning of this model. The NVIDIA RTX 6000 Ada Lovelace hardware provided the necessary computational power to achieve these results.