mbart-large-cc25-cnn-dailymail-nl

Model description

Finetuned version of mbart. We also wrote a blog post about this model here

Intended uses & limitations

It's meant for summarizing Dutch news articles.

How to use

import transformers
undisputed_best_model = transformers.MBartForConditionalGeneration.from_pretrained(
    "ml6team/mbart-large-cc25-cnn-dailymail-nl-finetune"
)
tokenizer = transformers.MBartTokenizer.from_pretrained("facebook/mbart-large-cc25")
summarization_pipeline = transformers.pipeline(
    task="summarization",
    model=undisputed_best_model,
    tokenizer=tokenizer,
)
summarization_pipeline.model.config.decoder_start_token_id = tokenizer.lang_code_to_id[
    "nl_XX"
]
article = "Kan je dit even samenvatten alsjeblief."  # Dutch
summarization_pipeline(
    article,
    do_sample=True,
    top_p=0.75,
    top_k=50,
    # num_beams=4,
    min_length=50,
    early_stopping=True,
    truncation=True,
)[0]["summary_text"]

Training data

Finetuned mbart with this dataset and another smaller dataset that we can't open source because we scraped it from the internet. For more information check out our blog post here.

New

Select AutoNLP in the “Train” menu to fine-tune this model automatically.

Downloads last month
796
Hosted inference API
Summarization
This model can be loaded on the Inference API on-demand.