gpt2-medium-arabic-poetry

GPT2-Medium-Arabic-Poetry

Fine-tuned aubmindlab/aragpt2-medium on the Arabic Poetry Dataset (6th - 21st century) using 41,922 lines of poetry as the train split and 9,007 (by poets not in the train split) for validation.

Usage

from transformers import AutoModelForCausalLM, AutoTokenizer, set_seed

set_seed(42)
model_name = "elgeish/gpt2-medium-arabic-poetry"
model = AutoModelForCausalLM.from_pretrained(model_name).to("cuda")
tokenizer = AutoTokenizer.from_pretrained(model_name)
prompt = "ู„ู„ูˆู‡ู„ุฉ ุงู„ุฃูˆู„ู‰ ู‚ุฑุฃุช ููŠ ุนูŠู†ูŠู‡"
input_ids = tokenizer.encode(prompt, return_tensors="pt")
samples = model.generate(
    input_ids.to("cuda"),
    do_sample=True,
    early_stopping=True,
    max_length=32,
    min_length=16,
    num_return_sequences=3,
    pad_token_id=50256,
    repetition_penalty=1.5,
    top_k=32,
    top_p=0.95,
)

for sample in samples:
    print(tokenizer.decode(sample.tolist()))
    print("--")

Here's the output:

ู„ู„ูˆู‡ู„ุฉ ุงู„ุฃูˆู„ู‰ ู‚ุฑุฃุช ููŠ ุนูŠู†ูŠู‡ ุนู† ุชู„ูƒ ุงู„ู†ุณู… ู„ู… ุชุฐูƒุฑ ุดูŠุกุง ูู„ุฑุจู…ุง ู†ุงู…ุช ุนู„ูŠ ูƒุชููŠู‡ุง ุงู„ุนุตุงููŠุฑ ูˆุชู†ุงุซุฑุช ุงูˆุฑุงู‚ ุงู„ุชูˆุช ุนู„ูŠู‡ุง ูˆุบุงุจุช ุงู„ูˆุฑุฏุฉ ู…ู†
--
ู„ู„ูˆู‡ู„ุฉ ุงู„ุฃูˆู„ู‰ ู‚ุฑุฃุช ููŠ ุนูŠู†ูŠู‡ ุงูŠุฉ ู†ุดูˆุฉ ู…ู† ู†ุงุฑู‡ ูˆู‡ูŠ ุชู†ุธุฑ ุงู„ูŠ ุงู„ู…ุณุชู‚ุจู„ ุจุนูŠูˆู† ุฎู„ุงู‚ุฉ ูˆุฑุณู…ุช ุฎุทูˆุทู‡ ุงู„ุนุฑูŠุถุฉ ุนู„ูŠ ุฌุจูŠู†ูƒ ุงู„ุนุงุฑูŠ ุฑุณู…ุช ุงู„ุฎุทูˆุท ุงู„ุญู…ุฑ ููˆู‚ ุดุนุฑูƒ
--
ู„ู„ูˆู‡ู„ุฉ ุงู„ุฃูˆู„ู‰ ู‚ุฑุฃุช ููŠ ุนูŠู†ูŠู‡ ูƒู„ ู…ุง ูƒุงู† ูˆู…ุง ุณูŠูƒูˆู† ุบุฏุง ุงุฐุง ู„ู… ุชูƒู† ุงู…ุฑุงุฉ ุณุชูƒุจุฑ ูƒุซูŠุฑุง ุนู„ูŠ ุงู„ูˆุฑู‚ ุงู„ุงุจูŠุถ ุงูˆ ู„ุง ุชุฑูŠ ู…ุซู„ุง ุฎุทูˆุทุง ุฑููŠุนุฉ ููˆู‚ ุตูุญุฉ ุงู„ู…ุงุก
--
Downloads last month
358
Hosted inference API
Text Generation