--- datasets: - d0p3/ukr-pravda-news-summary - d0p3/ukr-pravda-news-summary-v1.1 - shamotskyi/ukr_pravda_2y language: - uk - en pipeline_tag: summarization license: cc-by-nc-4.0 --- # O3ap-sm: Ukrainian News Summarizer This repository contains the 03ap-sm model, a Ukrainian news summarization model fine-tuned on the T5-small architecture. The model has been trained on the Ukrainian Corpus CCMatrix for text summarization tasks. ## Model Overview * **Base Model:** T5-small * **Training Dataset:** Ukrainian Corpus CCMatrix * **Fine-tuning Task:** News article summarization * **Fine-tuning Dataset:** * [shamotskyi/ukr_pravda_2y](https://huggingface.co/datasets/shamotskyi/ukr_pravda_2y) * [d0p3/ukr-pravda-news-summary](https://huggingface.co/datasets/d0p3/ukr-pravda-news-summary) * [d0p3/ukr-pravda-news-summary-v1.0](https://huggingface.co/datasets/d0p3/ukr-pravda-news-summary-v1.1) * **Language:** Ukrainian, English ## Usage **Installation** ```bash pip install transformers ``` **Loading the Model** ```python from transformers import AutoTokenizer, AutoModelForSeq2SeqLM tokenizer = AutoTokenizer.from_pretrained("d0p3/O3ap-sm") model = AutoModelForSeq2SeqLM.from_pretrained("d0p3/O3ap-sm") ``` **Generating Summaries** ```python news_article = "**YOUR NEWS ARTICLE TEXT IN UKRAINIAN**" input_ids = tokenizer(news_article, return_tensors="pt").input_ids output_ids = model.generate(input_ids) summary = tokenizer.decode(output_ids[0], skip_special_tokens=True) print(summary) ``` ## Limitations * The model may not perform optimally on informal or highly colloquial Ukrainian text. * As with any language model, there's a possibility of generating factually incorrect summaries or summaries that reflect biases present in the training data. ## Ethical Considerations * **Transparency:** Clearly state the model's intended use for summarizing news articles, and its limitations. * **Bias:** Be aware of biases that may have been introduced during training data selection or the fine-tuning process. Employ mitigation strategies where possible. * **Misuse:** Acknowledge the potential for misuse of the model, such as generating misleading summaries. Advise caution and critical evaluation of its outputs. ## Contributing We welcome contributions and feedback! ## License This model is released under the [CC-BY-NC-4.0].