jakobcassiman commited on
Commit
acf5361
1 Parent(s): 255bc12

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +44 -0
README.md ADDED
@@ -0,0 +1,44 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ language:
3
+ - nl
4
+ tags:
5
+ - mbart
6
+ - bart
7
+ - summarization
8
+ datasets:
9
+ - ml6team/cnn_dailymail_nl
10
+ ---
11
+ # mbart-large-cc25-cnn-dailymail-nl
12
+ ## Model description
13
+ Finetuned version of [mbart](https://huggingface.co/facebook/mbart-large-cc25). We also wrote a **blog post** about this model [here](https://blog.ml6.eu/)
14
+ ## Intended uses & limitations
15
+ It's meant for summarizing Dutch news articles.
16
+ #### How to use
17
+ ```python
18
+ import transformers
19
+ undisputed_best_model = transformers.MBartForConditionalGeneration.from_pretrained(
20
+ "ml6team/mbart-large-cc25-cnn-dailymail-nl-finetune"
21
+ )
22
+ tokenizer = transformers.MBartTokenizer.from_pretrained("facebook/mbart-large-cc25")
23
+ summarization_pipeline = transformers.pipeline(
24
+ task="summarization",
25
+ model=undisputed_best_model,
26
+ tokenizer=tokenizer,
27
+ )
28
+ summarization_pipeline.model.config.decoder_start_token_id = tokenizer.lang_code_to_id[
29
+ "nl_XX"
30
+ ]
31
+ article = "Kan je dit even samenvatten alsjeblief." # Dutch
32
+ summarization_pipeline(
33
+ article,
34
+ do_sample=True,
35
+ top_p=0.75,
36
+ top_k=50,
37
+ # num_beams=4,
38
+ min_length=50,
39
+ early_stopping=True,
40
+ truncation=True,
41
+ )[0]["summary_text"]
42
+ ```
43
+ ## Training data
44
+ Finetuned [mbart](https://huggingface.co/facebook/mbart-large-cc25) with [this dataset](https://huggingface.co/datasets/ml6team/cnn_dailymail_nl) and another smaller dataset that we can't open source because we scraped it from the internet. For more information check out our blog post [here](https://blog.ml6.eu/).