gpt2-small-amharic / README.md
rasyosef's picture
Update README.md
ca20ed6 verified
|
raw
history blame
No virus
2.88 kB
---
widget:
- text: አዲስ አበባ
example_title: Example 1
- text: ኢንግሊዝ ፕሪምየር ሊግ
example_title: Example 2
- text: ፕሬዚዳንት ዶናልድ ትራምፕ
example_title: Example 3
language:
- am
metrics:
- perplexity
library_name: transformers
pipeline_tag: text-generation
---
# gpt2-small-amharic
This is a smaller version of the [gpt2](https://huggingface.co/openai-community/gpt2) decoder transformer model pretrained from scratch for **2 days** on **290 million tokens** of **Amharic** text.
- It has **33.7 Million parameters**
- The **context size** of this model is **128** tokens.
- It has the same **tokenizer** as gpt2, trained from scratch using the same Amharic dataset as the model with a vocabulary size of **16384**.
- This is a base model and hasn't undergone any supervised finetuing yet.
It achieves the following results on the evaluation set:
- `Loss: 3.96`
- `Perplexity: 52.55`
### How to use
You can use this model directly with a pipeline for text generation:
```python
from transformers import pipeline
gpt2_am = pipeline(
"text-generation",
model="rasyosef/gpt2-small-amharic"
)
prompt = "በ ኢንግሊዝ ፕሪምየር ሊግ"
gpt2_am(
prompt,
max_new_tokens=64,
temperature=0.8,
do_sample=True,
top_k=8,
top_p=0.8,
repetition_penalty=1.25
)
```
Output:
```python
[{'generated_text': 'በ ኢንግሊዝ ፕሪምየር ሊግ የዋንጫ ባለቤት የሆነው ማንቸስተር ሲቲ በ9 ነጥብ ተበልጦ አራተኛ ደረጃ ላይ ይገኛል ።\nከትናንት በስቲያ ምሽት በእንግሊዝ ፕሬሚየር ሊግ አርሰናልን 3 ለ1 በማሸነፍ ነጥቡን ወደ 7 ከፍ በማድረግ በደረጃ ሠንጠረዡ ግርጌ ላይ የሚገኘው ሊቨርፑል ትናንት ማታ ከበርንሌይ ጋር አንድ እኩል ተለያይቷል'}]
```
#### Hallucination
Due to the model's small size, hallucinations occur often in the generated text. Here's an example
```python
[{'generated_text': 'በ ኢንግሊዝ ፕሪምየር ሊግ የ5ኛ ሳምንት መርሃግብር ዛሬ ምሽት 4 :00 ሰአት ላይ በዋልያዎቹ 2-0 አሸናፊነት ተጠናቋል፡፡\nከጨዋታው መጠናቀቅ በኋላ የኢትዮጵያ እግር ኳስ ፌደሬሽን ስራ አስፈፃሚ ኮሚቴ ሰብሳቢ አቶ ኢሳያስ ጂራ እና ምክትል ፕሬዝዳንቱ አቶ ሰለሞን ገ/እግዚያብሔር ለሶከር ኢትዮጵያ እንደገለፁት የሁለቱ ቡድኖች ጨዋታ ነገ ጠዋት 3:30'}]
```
### Demo
You can use the following demo to generate text using gpt2-small-amharic. Please **enter a prompt** and click the **Generate** button to generate completions for the prompt.
https://huggingface.co/spaces/rasyosef/GPT2-Amharic