Hugging Face Model: facebook/bart-large-cnn
Overview
This repository contains a fine-tuned version of the 'facebook/bart-large-cnn' model for summarization tasks. The model has been trained and optimized specifically for dialog and book summarization tasks using English language datasets.
Model Details
- Model: facebook/bart-large-cnn
- Task: Summarization
- Fine-tuning Datasets: Dialog Summarization, Book Summarization
- Language: English
Usage
- required libs installation
! pip install transformers
- Load the fine-tuned 'facebook/bart-large-cnn' model for dialog and book summarization tasks.
# Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("doublecringe123/bardt-large-cnn-dialoguesum-booksum")
model = AutoModelForSeq2SeqLM.from_pretrained("doublecringe123/bardt-large-cnn-dialoguesum-booksum")
- Input your text data for summarization using the loaded model.
text = """What is lasagne alla bolognese made of?
Classic Lasagna Bolognese Recipe
It's a fundamentally simple recipe, with only a few key components: the pasta; the meat sauce, known as ragù Bolognese; besciamella (a.k.a. béchamel or white sauce); and grated Parmigiano-Reggiano cheese."""
tokens = tokenizer(text, return_tensors = 'pt', truncation=True)
generated_encodes = model.generate(**tokens)
tokenizer.batch_decode(generated_encodes)
- You also can prepare text. For example
! pip install -q nltk
import nltk
nltk.download('stopwords')
from nltk.corpus import stopwords
eng_stopwords = stopwords.words('english')
from nltk.tokenize import word_tokenize
from nltk.stem import SnowballStemmer
stemmer = SnowballStemmer(language='english', ignore_stopwords = False)
def text_remove_stopwords(seq) -> list:
""""Assuming that seq is tokenized text"""
return [t for t in seq if t not in eng_stopwords]
def text_stemming(seq) -> list:
""""Assuming that seq is tokenized text"""
return [stemmer.stem(word) for word in seq]
def text_process(text, fn_list = None) -> str:
tokens = word_tokenize(" ".join(text.split()))
if fn_list == None:
fn_list = [text_remove_stopwords, text_stemming]
for fn in fn_list:
tokens = fn(tokens)
return " ".join(tokens)
text = """What is lasagne alla bolognese made of?
Classic Lasagna Bolognese Recipe
It's a fundamentally simple recipe, with only a few key components: the pasta; the meat sauce, known as ragù Bolognese; besciamella (a.k.a. béchamel or white sauce); and grated Parmigiano-Reggiano cheese."""
text = text_process(text)
tokens = tokenizer(text, return_tensors = 'pt', truncation=True)
generated_encodes = model.generate(**tokens)
tokenizer.batch_decode(generated_encodes)
Performance
The model has been fine-tuned and evaluated on dialog and book summarization datasets, achieving high accuracy and quality summaries.
Citation
If you use this model or code in your work, please cite the original 'facebook/bart-large-cnn' model and the datasets used for fine-tuning.
- Downloads last month
- 25