metadata

{}

Model

This model card provides information about a fine-tuned T5 base model that has been specifically trained for generating summaries. The model utilizes transfer learning techniques and is based on a subset of the XSum and ChatGPT datasets. We have made some key modifications to the training process to optimize the model's performance and provide the best possible summaries, particularly supporting greater length outputs.

Dataset and Training: The fine-tuned T5 base model is trained on a carefully curated subset of the XSum and ChatGPT datasets. These datasets contain a wide range of text samples, including news articles and conversational data. By utilizing this diverse data, the model gains a broader understanding of language and improves its ability to generate accurate and coherent summaries.

Transfer Learning for Summarization: Transfer learning is employed to enhance the model's performance in generating summaries. The T5 base model, pre-trained on a large corpus of text, is fine-tuned using the curated dataset mentioned above. This process allows the model to leverage its pre-existing knowledge while adapting specifically to the summarization task. By fine-tuning the model, we aim to improve its ability to capture important information and generate concise summaries.

Target Output Length: One notable difference between this model and other similar models is that it is trained on the target output length of 512. This means that the model is explicitly trained to generate summaries that are up to 512 tokens long. By focusing on this target output length, we aim to provide summaries that are more comprehensive and informative, while still maintaining a reasonable length.

Enhanced Support for Greater Length Output: We are confident that this fine-tuned T5 model will generate the best possible summaries, particularly for supporting greater length outputs. By training the model with a specific focus on generating longer summaries, we have enhanced its ability to handle and convey more detailed information. This makes the model particularly useful in scenarios where longer summaries are required, such as summarizing lengthy documents or providing in-depth analysis.

Conclusion: The fine-tuned T5 base model described in this model card offers an advanced summarization capability, with a specific emphasis on supporting greater length outputs. By utilizing a carefully curated dataset and applying transfer learning techniques, the model has been optimized to generate accurate and informative summaries. We believe that this model will be a valuable tool for a wide range of applications that require comprehensive and well-structured summaries.

How to Get Started with the Model

Use the code below to get started with the model.

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
from math import ceil 

model_name = "talalH/summarizer_on_T5_base"
tokenizer = AutoTokenizer.from_pretrained(model_name, device_map='auto')
model = AutoModelForSeq2SeqLM.from_pretrained(model_name, device_map='auto')

print("-"*100)
print("\nHi !! ENTER A TEXT TO SUMMARIZE (type 'exit' to end)\n")

while True:
    user_input = input("USER: ")
    if user_input.lower() == "exit":
        break

    input_text = user_input
    input_ids = tokenizer(f'summarize: {input_text}</s>', return_tensors="pt"
                            , max_length=512, truncation=True,).input_ids
    min_len = ceil(len(input_text.split())/2)
    outputs = model.generate(input_ids, temperature=0.3, repetition_penalty=10.0
                            , num_return_sequences=3, no_repeat_ngram_size=2
                            , num_beams=10, num_beam_groups=5
                            , min_length = min_len, max_length=512, diversity_penalty=2.0
                            )
    sequences = tokenizer.batch_decode(outputs, skip_special_tokens=True)

    print("\nOUTPUT")   
    try:
        if len(sequences):
            for para_phrase in sequences:
                print("T5: ",para_phrase,"\n")
        else: 
            print("T5: NO RESPONSE RETURNED")
    except:
        print("T5: NO RESPONSE RETURNED")

    print("-"*100)

Developed by: Talal Hassan (talalhassan141@gmail.com)
Finetuned from model [optional]: T5 BASE

Uses

For Text Summarization

Training Details

epochs: 4 Warmup_steps: 50 max_step: -1 lr:5e-5 batch_size = 4

Hardware Type: Tesla K80 GPUs
Hours used: 48h

Model Card Authors [optional]

Talal Hassan (talalhassan141@gmail.com)

Model Card Contact

Talal Hassan (talalhassan141@gmail.com)