|
--- |
|
widget: |
|
- text: አዲስ አበባ |
|
example_title: Example 1 |
|
- text: በ ኢንግሊዝ ፕሪምየር ሊግ |
|
example_title: Example 2 |
|
- text: ፕሬዚዳንት ዶናልድ ትራምፕ |
|
example_title: Example 3 |
|
language: |
|
- am |
|
metrics: |
|
- perplexity |
|
library_name: transformers |
|
pipeline_tag: text-generation |
|
--- |
|
|
|
# gpt2-small-amharic |
|
|
|
This is a smaller version of the [gpt2](https://huggingface.co/openai-community/gpt2) decoder transformer model pretrained from scratch for **2 days** on **290 million tokens** of **Amharic** text. |
|
|
|
- It has **33.7 Million parameters** |
|
- The **context size** of this model is **128** tokens. |
|
- It has the same **tokenizer** as gpt2, trained from scratch using the same Amharic dataset as the model with a vocabulary size of **16384**. |
|
- This is a base model and hasn't undergone any supervised finetuing yet. |
|
|
|
It achieves the following results on the evaluation set: |
|
|
|
- `Loss: 3.96` |
|
- `Perplexity: 52.55` |
|
|
|
### How to use |
|
You can use this model directly with a pipeline for text generation: |
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
gpt2_am = pipeline( |
|
"text-generation", |
|
model="rasyosef/gpt2-small-amharic" |
|
) |
|
|
|
prompt = "በ ኢንግሊዝ ፕሪምየር ሊግ" |
|
gpt2_am( |
|
prompt, |
|
max_new_tokens=64, |
|
temperature=0.8, |
|
do_sample=True, |
|
top_k=8, |
|
top_p=0.8, |
|
repetition_penalty=1.25 |
|
) |
|
``` |
|
|
|
Output: |
|
```python |
|
[{'generated_text': 'በ ኢንግሊዝ ፕሪምየር ሊግ የዋንጫ ባለቤት የሆነው ማንቸስተር ሲቲ በ9 ነጥብ ተበልጦ አራተኛ ደረጃ ላይ ይገኛል ።\nከትናንት በስቲያ ምሽት በእንግሊዝ ፕሬሚየር ሊግ አርሰናልን 3 ለ1 በማሸነፍ ነጥቡን ወደ 7 ከፍ በማድረግ በደረጃ ሠንጠረዡ ግርጌ ላይ የሚገኘው ሊቨርፑል ትናንት ማታ ከበርንሌይ ጋር አንድ እኩል ተለያይቷል'}] |
|
``` |
|
|
|
#### Hallucination |
|
Due to the model's small size, hallucinations occur often in the generated text. Here's an example |
|
```python |
|
[{'generated_text': 'በ ኢንግሊዝ ፕሪምየር ሊግ የ5ኛ ሳምንት መርሃግብር ዛሬ ምሽት 4 :00 ሰአት ላይ በዋልያዎቹ 2-0 አሸናፊነት ተጠናቋል፡፡\nከጨዋታው መጠናቀቅ በኋላ የኢትዮጵያ እግር ኳስ ፌደሬሽን ስራ አስፈፃሚ ኮሚቴ ሰብሳቢ አቶ ኢሳያስ ጂራ እና ምክትል ፕሬዝዳንቱ አቶ ሰለሞን ገ/እግዚያብሔር ለሶከር ኢትዮጵያ እንደገለፁት የሁለቱ ቡድኖች ጨዋታ ነገ ጠዋት 3:30'}] |
|
``` |
|
|
|
### Demo |
|
|
|
You can use the following demo to generate text using gpt2-small-amharic. Please **enter a prompt** and click the **Generate** button to generate completions for the prompt. |
|
|
|
https://huggingface.co/spaces/rasyosef/GPT2-Amharic |
|
|