O3ap-sm: Ukrainian News Summarizer
This repository contains the 03ap-sm model, a Ukrainian news summarization model fine-tuned on the T5-small architecture. The model has been trained on the Ukrainian Corpus CCMatrix for text summarization tasks.
Model Overview
- Base Model: T5-small
- Training Dataset: Ukrainian Corpus CCMatrix
- Fine-tuning Task: News article summarization
- Fine-tuning Dataset:
- Language: Ukrainian, English
Usage
Installation
pip install transformers
Loading the Model
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("d0p3/O3ap-sm")
model = AutoModelForSeq2SeqLM.from_pretrained("d0p3/O3ap-sm")
Generating Summaries
news_article = "**YOUR NEWS ARTICLE TEXT IN UKRAINIAN**"
input_ids = tokenizer(news_article, return_tensors="pt").input_ids
output_ids = model.generate(input_ids)
summary = tokenizer.decode(output_ids[0], skip_special_tokens=True)
print(summary)
Limitations
- The model may not perform optimally on informal or highly colloquial Ukrainian text.
- As with any language model, there's a possibility of generating factually incorrect summaries or summaries that reflect biases present in the training data.
Ethical Considerations
- Transparency: Clearly state the model's intended use for summarizing news articles, and its limitations.
- Bias: Be aware of biases that may have been introduced during training data selection or the fine-tuning process. Employ mitigation strategies where possible.
- Misuse: Acknowledge the potential for misuse of the model, such as generating misleading summaries. Advise caution and critical evaluation of its outputs.
Contributing
We welcome contributions and feedback!
License
This model is released under the [CC-BY-NC-4.0].
- Downloads last month
- 6
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.