d0p3
/

Summarization
Transformers
Safetensors
Ukrainian
English
t5
text2text-generation
text-generation-inference
Inference Endpoints
O3ap-sm / README.md
d0p3's picture
Update README.md
f869369 verified
metadata
datasets:
  - d0p3/ukr-pravda-news-summary
  - d0p3/ukr-pravda-news-summary-v1.1
  - shamotskyi/ukr_pravda_2y
language:
  - uk
  - en
pipeline_tag: summarization
license: cc-by-nc-4.0

O3ap-sm: Ukrainian News Summarizer

This repository contains the 03ap-sm model, a Ukrainian news summarization model fine-tuned on the T5-small architecture. The model has been trained on the Ukrainian Corpus CCMatrix for text summarization tasks.

Model Overview

Usage

Installation

pip install transformers

Loading the Model

from transformers import AutoTokenizer, AutoModelForSeq2SeqLM

tokenizer = AutoTokenizer.from_pretrained("d0p3/O3ap-sm")
model = AutoModelForSeq2SeqLM.from_pretrained("d0p3/O3ap-sm")

Generating Summaries

news_article = "**YOUR NEWS ARTICLE TEXT IN UKRAINIAN**"

input_ids = tokenizer(news_article, return_tensors="pt").input_ids
output_ids = model.generate(input_ids)

summary = tokenizer.decode(output_ids[0], skip_special_tokens=True)

print(summary)

Limitations

  • The model may not perform optimally on informal or highly colloquial Ukrainian text.
  • As with any language model, there's a possibility of generating factually incorrect summaries or summaries that reflect biases present in the training data.

Ethical Considerations

  • Transparency: Clearly state the model's intended use for summarizing news articles, and its limitations.
  • Bias: Be aware of biases that may have been introduced during training data selection or the fine-tuning process. Employ mitigation strategies where possible.
  • Misuse: Acknowledge the potential for misuse of the model, such as generating misleading summaries. Advise caution and critical evaluation of its outputs.

Contributing

We welcome contributions and feedback!

License

This model is released under the [CC-BY-NC-4.0].