GPT 2 DV base

This is a GPT-2 model fine-tuned on Dhivehi language texts. The model was trained on a curated dataset of Dhivehi Wikipedia articles and can be used for text generation in the Dhivehi language.

Model Description

Model Type: GPT-2
Language: Dhivehi (ދިވެހި)
Training Data: Dhivehi Wikipedia articles
Last Updated: 2024-11-25

Performance Metrics

Evaluation metrics on the test set:

Average Perplexity: 3.80
Perplexity Std: 2.23
Best Perplexity: 2.72

Usage Example

from transformers import GPT2LMHeadModel, GPT2TokenizerFast

# Load model and tokenizer
model = GPT2LMHeadModel.from_pretrained("alakxender/dhivehi-gpt2-base")
tokenizer = GPT2TokenizerFast.from_pretrained("alakxender/dhivehi-gpt2-base")

# Prepare your prompt
prompt = "ދިވެހިރާއްޖެއަކީ"
inputs = tokenizer(prompt, return_tensors="pt")

# Generate text
outputs = model.generate(
    **inputs,
    max_length=200,
    temperature=0.7,
    top_p=0.9,
    do_sample=True,
    num_return_sequences=1
)

# Decode the generated text
generated_text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(generated_text)

Training Details

The model was trained using the following configuration:

Base model: GPT-2
Training type: Full fine-tuning
Mixed precision: FP16
Gradient checkpointing: Enabled

Hyperparameters:

Learning rate: 5e-5
Batch size: 32
Gradient accumulation steps: 2
Epochs: 3
Weight decay: 0.01
Warmup steps: 1000

Limitations

Primary training data is from Wikipedia, which may not cover all Dhivehi language contexts
May not perform well on specialized or technical content
Could reflect biases present in the training data
Not recommended for production use without thorough evaluation

Intended Uses

This model is suitable for:

Dhivehi text generation
Research on Dhivehi NLP
Educational purposes
Experimental applications

Not intended for:

Critical or production systems
Decision-making applications
Tasks requiring factual accuracy

alakxender
/

dhivehi-gpt2-base