Edit model card
YAML Metadata Error: "datasets[2]" with value "samsum_(translated_into_Russian)" is not valid. If possible, use a dataset id from https://hf.co/datasets.

📝 Description

MBart for Russian summarization fine-tuned for dialogues summarization.

This model was firstly fine-tuned by Ilya Gusev on Gazeta dataset. We have fine tuned that model on SamSum dataset translated to Russian using GoogleTranslateAPI

🤗 Moreover! We have implemented a ! telegram bot @summarization_bot ! with the inference of this model. Add it to the chat and get summaries instead of dozens spam messages!  🤗

❓ How to use with code

from transformers import MBartTokenizer, MBartForConditionalGeneration

# Download model and tokenizer
model_name = "Kirili4ik/mbart_ruDialogSum"   
tokenizer =  AutoTokenizer.from_pretrained(model_name)
model = MBartForConditionalGeneration.from_pretrained(model_name)
model.eval()

article_text = "..."

input_ids = tokenizer(
    [article_text],
    max_length=600,
    padding="max_length",
    truncation=True,
    return_tensors="pt",
)["input_ids"]

output_ids = model.generate(
    input_ids=input_ids,
    top_k=0,
    num_beams=3,
    no_repeat_ngram_size=3
)[0]


summary = tokenizer.decode(output_ids, skip_special_tokens=True)
print(summary)
Downloads last month
212

Evaluation results

  • Validation ROGUE-1 on SAMSum Corpus (translated to Russian)
    self-reported
    34.500
  • Validation ROGUE-L on SAMSum Corpus (translated to Russian)
    self-reported
    33.000
  • Test ROGUE-1 on SAMSum Corpus (translated to Russian)
    self-reported
    31.000
  • Test ROGUE-L on SAMSum Corpus (translated to Russian)
    self-reported
    28.000