Edit model card

Model Overview

The News Articles Teacher-Student Abstractive Summarizer is a fine-tuned model based on BART-large, utilizing StableBeluga-7B as the teacher model. This model is designed to provide high-quality abstractive summarization of news articles with improved efficiency in terms of speed and computational resource usage.

Model Details

  • Model Type: Abstractive Summarization
  • Base Model: BART-large
  • Teacher Model: StableBeluga-7B
  • Language: English

DataSet

  • Source: 295,174 news articles scrapped from a Mexican newspaper.
  • Translation: The Spanish articles were translated to English using the Helsinki-NLP/opus-mt-es-en NLP model.
  • Teacher Summaries: Generated by StableBeluga-7B.

Training

The fine-tuning process involved using the teacher observations (summaries) generated by StableBeluga-7B to train a lightweight BART model. This approach aims to replicate the summarization quality of the teacher model while achieving faster inference times and reduced GPU memory usage.

Performance

  • Evaluation Metrics:
    • ROUGE1: 0.66
    • Cosine Similarity: 0.90
  • Inference Speed: 3x faster than the teacher model (StableBeluga-7B)
  • Resource Usage: Significantly less GPU memory compared to StableBeluga-7B

Objective

The primary goal of this model is to provide a lightweight summarization solution that maintains high-quality output similar to the teacher model (StableBeluga-7B) but operates with greater efficiency, making it suitable for deployment in resource-constrained environments.

Use Cases

This model is ideal for applications requiring quick and efficient summarization of large volumes of news articles, particularly in settings where computational resources are limited.

Limitations

  • Language Translation: The initial translation from Spanish to English may introduce minor inaccuracies that could affect the summarization quality.
  • Domain Specificity: Fine-tuned specifically on news articles, performance may vary on texts from different domains.

Future Work

Future improvements could involve:

  • Fine-tuning the model on bilingual data to eliminate translation steps.
  • Expanding the dataset to include a wider variety of news sources and topics.
  • Exploring further optimizations to reduce inference time and resource usage.

Conclusion

The News Articles Teacher-Student Abstractive Summarizer model demonstrates the potential to deliver high-quality summaries efficiently, making it a valuable tool for news content processing and similar applications.

How to use:

# Load the Model
model = AutoModelForSeq2SeqLM.from_pretrained("JordiAb/BART_news_summarizer")
tokenizer = AutoTokenizer.from_pretrained("JordiAb/BART_news_summarizer")

# News article text
article_text = """
Los Angeles Lakers will have more time than anticipated. The four-time NBA Most Valuable Player (MVP) extended his contract for two years and $85 million, keeping him in California until 2023. In 2018, The King had already signed for 153 mdd and, in his second campaign in the quintet, led the championship in the Orlando bubble. With 35 years of life – he turns 36 on December 30 – and 17 campaigns of experience, LeBron is still considered one of the best (or the best) NBA players. You can read: "Mercedes found Lewis Hamilton\'s substitute" James just took the Lakers to his first NBA title since 2010 and was named MVP of the Finals; he led the League in assists per game (10.2) for the first time in his career, while adding 25.3 points and 7.8 rebounds per performance, during the last campaign. James has adapted to life in Hollywood, as he will be part of the sequel to Space Jam, to be released next year.
"""

# tokenize text
inputs = tokenizer(article_text, return_tensors='pt')
# generate summary
with torch.no_grad():
  summary_ids = model.generate(
    inputs['input_ids'],
    num_beams=4,
    max_length=250,
    early_stopping=True
  )
# decode summary
summary = tokenizer.decode(
  summary_ids[0],
  skip_special_tokens=True
)
Downloads last month
32
Safetensors
Model size
406M params
Tensor type
F32
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.