File size: 4,897 Bytes
5e470aa
1b25e8c
 
 
 
 
 
 
 
5e470aa
 
 
 
8629a74
 
5e470aa
30cbcdb
5e470aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b213b29
 
5e470aa
 
 
 
 
 
 
 
 
 
 
c1d949c
5e470aa
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
c1d949c
5e470aa
 
 
 
 
1b25e8c
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
---
datasets:
- xsum
- quora
language:
- en
metrics:
- rouge
pipeline_tag: text2text-generation
---

# Model

**NOTE:** **FEEL FREE TO DOWNLOAD FOR INFERENCE YOU WON'T REGRET IT :)**

<!-- Provide a quick summary of what the model is/does. -->
This model card provides information about a fine-tuned T5 base model that has been specifically trained for generating summaries. We have made some key modifications to the training process to optimize the model's performance and provide the best possible summaries, particularly supporting greater length outputs. One notable difference between this model and other similar models is that it is trained on the target output length of 512. This means that the model is explicitly trained to generate summaries that are up to 512 tokens long. By focusing on this target output length, we aim to provide summaries that are more comprehensive and informative, while still maintaining a reasonable length for large text.

**Dataset and Training:**
The fine-tuned T5 base model is trained on a carefully curated subset of the XSum and ChatGPT datasets. These datasets contain a wide range of text samples, including news articles and conversational data. By utilizing this diverse data, the model gains a broader understanding of language and improves its ability to generate accurate and coherent summaries.

**Transfer Learning for Summarization:**
Transfer learning is employed to enhance the model's performance in generating summaries. The T5 base model, pre-trained on a large corpus of text, is fine-tuned using the curated dataset mentioned above. This process allows the model to leverage its pre-existing knowledge while adapting specifically to the summarization task. By fine-tuning the model, we aim to improve its ability to capture important information and generate concise summaries.

**Enhanced Support for Greater Length Output:**
We are confident that this fine-tuned T5 model will generate the best possible summaries, particularly for supporting greater length outputs. By training the model with a specific focus on generating longer summaries, we have enhanced its ability to handle and convey more detailed information. This makes the model particularly useful in scenarios where longer summaries are required, such as summarizing lengthy documents or providing in-depth analysis.

**Conclusion:**
The fine-tuned T5 base model described in this model card offers an advanced summarization capability, with a specific emphasis on supporting greater length outputs. By utilizing a carefully curated dataset and applying transfer learning techniques, the model has been optimized to generate accurate and informative summaries. We believe that this model will be a valuable tool for a wide range of applications that require comprehensive and well-structured summaries.


## How to Get Started with the Model

Use the code below to get started with the model.

```python
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from math import ceil 

model_name = "talalH/summarizer_on_T5_base"
tokenizer = AutoTokenizer.from_pretrained(model_name, device_map='auto')
model = AutoModelForSeq2SeqLM.from_pretrained(model_name, device_map='auto')

print("-"*100)
print("\nHi !! ENTER A TEXT TO SUMMARIZE (type 'exit' to end)\n")

while True:
    user_input = input("USER: ")
    if user_input.lower() == "exit":
        break

    input_text = user_input
    input_ids = tokenizer(f'summarize: {input_text}</s>', return_tensors="pt"
                            , max_length=512, truncation=True,).input_ids
    min_len = ceil(len(input_text.split())/2)
    outputs = model.generate(input_ids, temperature=0.3, repetition_penalty=10.0
                            , num_return_sequences=3, no_repeat_ngram_size=2
                            , num_beams=10, num_beam_groups=5
                            , min_length = min_len, max_length=512, diversity_penalty=2.0
                            )
    sequences = tokenizer.batch_decode(outputs, skip_special_tokens=True)

    print("\nOUTPUT")   
    try:
        if len(sequences):
            for seq in sequences:
                print("T5: ",seq,"\n")
        else: 
            print("T5: NO RESPONSE RETURNED")
    except:
        print("T5: NO RESPONSE RETURNED")

    print("-"*100)

```
<!-- Provide a longer summary of what this model is. -->

- **Developed by:** Talal Hassan (talalhassan141@gmail.com)
- **Finetuned from model:** T5 BASE


## Uses

<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->

For Text Summarization

## Training Details

epochs: 4
Warmup_steps: 50
max_step: -1
lr:5e-5
batch_size = 4

- **Hardware Type:** Tesla K80 GPUs


## Model Card Authors

Talal Hassan (talalhassan141@gmail.com)

## Model Card Contact

Talal Hassan (talalhassan141@gmail.com)