Hugging Face Model: facebook/bart-large-cnn


This repository contains a fine-tuned version of the 'facebook/bart-large-cnn' model for summarization tasks. The model has been trained and optimized specifically for dialog and book summarization tasks using English language datasets.

Model Details

  • Model: facebook/bart-large-cnn
  • Task: Summarization
  • Fine-tuning Datasets: Dialog Summarization, Book Summarization
  • Language: English


  1. required libs installation ! pip install transformers
  2. Load the fine-tuned 'facebook/bart-large-cnn' model for dialog and book summarization tasks.
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("doublecringe123/bardt-large-cnn-dialoguesum-booksum")

model = AutoModelForSeq2SeqLM.from_pretrained("doublecringe123/bardt-large-cnn-dialoguesum-booksum") 
  1. Input your text data for summarization using the loaded model.
text = """What is lasagne alla bolognese made of?
Classic Lasagna Bolognese Recipe
It's a fundamentally simple recipe, with only a few key components: the pasta; the meat sauce, known as ragù Bolognese; besciamella (a.k.a. béchamel or white sauce); and grated Parmigiano-Reggiano cheese."""

tokens = tokenizer(text, return_tensors = 'pt', truncation=True)

generated_encodes = model.generate(**tokens)

  1. You also can prepare text. For example
! pip install -q nltk

import nltk

from nltk.corpus import stopwords

eng_stopwords = stopwords.words('english')

from nltk.tokenize import word_tokenize

from nltk.stem import SnowballStemmer

stemmer = SnowballStemmer(language='english', ignore_stopwords = False)
def text_remove_stopwords(seq) -> list: 
    """"Assuming that seq is tokenized text"""
    return [t for t in seq if t not in eng_stopwords]

def text_stemming(seq) -> list:
    """"Assuming that seq is tokenized text"""
    return [stemmer.stem(word) for word in seq]

def text_process(text, fn_list = None) -> str:
    tokens = word_tokenize(" ".join(text.split()))
    if fn_list == None: 
        fn_list = [text_remove_stopwords, text_stemming]
    for fn in fn_list: 
        tokens = fn(tokens)
    return " ".join(tokens)
text = text_process(text)

tokens = tokenizer(text, return_tensors = 'pt', truncation=True)

generated_encodes = model.generate(**tokens)



The model has been fine-tuned and evaluated on dialog and book summarization datasets, achieving high accuracy and quality summaries.


If you use this model or code in your work, please cite the original 'facebook/bart-large-cnn' model and the datasets used for fine-tuning.

