File size: 4,449 Bytes
d12f040
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
Quantization made by Richard Erkhov.

[Github](https://github.com/RichardErkhov)

[Discord](https://discord.gg/pvy7H8DZMG)

[Request more models](https://github.com/RichardErkhov/quant_request)


bart_summarizer_model - bnb 8bits
- Model creator: https://huggingface.co/KipperDev/
- Original model: https://huggingface.co/KipperDev/bart_summarizer_model/




Original model description:
---
license: mit
datasets:
- big_patent
language:
- en
metrics:
- rouge
tags:
- summarization
- summarizer
- text summarization
- abstractive summarization
pipeline_tag: summarization
---

[![Generic badge](https://img.shields.io/badge/STATUS-WIP-yellow.svg)](https://shields.io/)

[![Open in Collab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1TWasAT17zU90CqgbK98ouDuBXXHtwbVL?usp=sharing)

# Table of Contents

1. [Model Details](#model-details)
2. [Usage](#usage)
3. [Training Details](#training-details)
4. [Training Results](#training-results)
5. [Citation](#citation)
6. [Author](#model-card-authors)

# Model Details

This variant of the [facebook/bart-base](https://huggingface.co/facebook/bart-base) model, is fine-tuned specifically for the task of text summarization. This model aims to generate concise, coherent, and informative summaries from extensive text documents, leveraging the power of the BART bidirectional (BERT-like) encoder and an autoregressive (GPT-like) decoder approach.

# Usage

This model is intended for use in summarizing long-form texts into concise, informative abstracts. It's particularly useful for professionals and researchers who need to quickly grasp the essence of detailed reports, research papers, or articles without reading the entire text. 

## Get Started

Install with `pip`:

```bash
pip install transformers
```

Use in python:

```python
from transformers import pipeline
from transformers import AutoTokenizer
from transformers import AutoModelForSeq2SeqLM

model_name = "KipperDev/bart_summarizer_model"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSeq2SeqLM.from_pretrained(model_name)
summarizer = pipeline("summarization", model=model, tokenizer=tokenizer)

# Example usage
prefix = "summarize: "
input_text = "Your input text here."
input_ids = tokenizer.encode(prefix + input_text, return_tensors="pt")
summary_ids = model.generate(input_ids)
summary = tokenizer.decode(summary_ids[0], skip_special_tokens=True)

print(summary)
```

**NOTE THAT FOR THE MODEL TO WORK AS INTENDED, YOU NEED TO APPEND THE 'summarize:' PREFIX BEFORE THE INPUT DATA**

# Training Details

## Training Data

The model was trained using the [Big Patent Dataset](https://huggingface.co/datasets/big_patent), comprising 1.3 million US patent documents and their corresponding human-written summaries. This dataset was chosen for its rich language and complex structure, representative of the challenging nature of document summarization tasks. 

Training involved multiple subsets of the dataset to ensure broad coverage and robust model performance across varied document types.

## Training Procedure

Training was conducted over three rounds, with initial settings including a learning rate of 0.00002, batch size of 8, and 4 epochs. Subsequent rounds adjusted these parameters to refine model performance further, for respectively 0.0003, 8 and 12. As well, a linear decay learning rate schedule was applied to enhance model learning efficiency over time.

# Training results

Model performance was evaluated using the ROUGE metric, highlighting its capability to generate summaries closely aligned with human-written abstracts.

| **Metric**                              | **Value**  |
|-----------------------------------------|------------|
| Evaluation Loss (Eval Loss)             | 1.9244     |
| Rouge-1                                 | 0.5007     |
| Rouge-2                                 | 0.2704     |
| Rouge-L                                 | 0.3627     |
| Rouge-Lsum                              | 0.3636     |
| Average Generation Length (Gen Len)     | 122.1489   |
| Runtime (seconds)                       | 1459.3826  |
| Samples per Second                      | 1.312      |
| Steps per Second                        | 0.164      |


# Citation

**BibTeX:**

```bibtex
@article{kipper_t5_summarizer,
 // SOON
}
```

# Authors

This model card was written by [Fernanda Kipper](https://www.fernandakipper.com/)