mrm8488
/

bert2bert_shared-german-finetuned-summarization

encoder-decoder

text2text-generation

Inference Endpoints

Model card Files Files and versions Community

mrm8488 commited on May 27, 2021

Commit

c33f24b

•

1 Parent(s): af07cf7

Update README.md

Files changed (1) hide show

README.md +35 -0

README.md CHANGED Viewed

@@ -10,3 +10,38 @@ widget:
 ---
 # German BERT2BERT fine-tuned on MLSUM DE for summarization

 ---
 # German BERT2BERT fine-tuned on MLSUM DE for summarization
+## Model
+[bert-base-german-cased](https://huggingface.co/bert-base-german-cased) (BERT Checkpoint)
+## Dataset
+**MLSUM** is the first large-scale MultiLingual SUMmarization dataset. Obtained from online newspapers, it contains 1.5M+ article/summary pairs in five different languages -- namely, French, **German**, Spanish, Russian, Turkish. Together with English newspapers from the popular CNN/Daily mail dataset, the collected data form a large scale multilingual dataset which can enable new research directions for the text summarization community. We report cross-lingual comparative analyses based on state-of-the-art systems. These highlight existing biases which motivate the use of a multi-lingual dataset.
+[MLSUM de](https://huggingface.co/datasets/viewer/?dataset=mlsum)
+## Results
+|Set|Metric| # Score|
+|----|------|------|
+| Test  |Rouge2 - mid -precision | ****|
+| Test | Rouge2 - mid - recall | ****|
+| Test | Rouge2 - mid - fmeasure | ****|
+## Usage
+ ```python
+ import torch
+ from transformers import BertTokenizerFast, EncoderDecoderModel
+ device = 'cuda' if torch.cuda.is_available() else 'cpu'
+ ckpt = 'mrm8488/bert2bert_shared-german-finetuned-summarization'
+ tokenizer = BertTokenizerFast.from_pretrained(ckpt)
+model = EncoderDecoderModel.from_pretrained(ckpt).to(device)
+def generate_summary(text):
+    inputs = tokenizer([text], padding="max_length", truncation=True, max_length=512, return_tensors="pt")
+    input_ids = inputs.input_ids.to(device)
+    attention_mask = inputs.attention_mask.to(device)
+    output = model.generate(input_ids, attention_mask=attention_mask)
+    return tokenizer.decode(output[0], skip_special_tokens=True)
+text = "Your text here..."
+generate_summary(text)
+```
+> Created by [Manuel Romero/@mrm8488](https://twitter.com/mrm8488) with the support of [Narrativa](https://www.narrativa.com/)
+> Made with <span style="color: #e25555;">&hearts;</span> in Spain