OpenAI GPT-2 355M

Model description

This custom GPT-2 model is derived from the gpt2-medium model and trained on the Alpaca dataset. Anezatra team meticulously trained this model on the Alpaca dataset for natural language processing tasks. The model excels in text generation and language understanding tasks, making it ideal for chat applications.

Training Procedure

This model was trained with 4 x A100 GPUs

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 2e-05
train_batch_size: 1
eval_batch_size: 1
seed: 42
gradient_accumulation_steps: 128
total_train_batch_size: 128
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: linear
lr_scheduler_warmup_ratio: 0.15
num_epochs: 1

anezatra
/

gpt2-alpaca-355M

OpenAI GPT-2 355M

Model description

Training Procedure

Training hyperparameters

Dataset used to train anezatra/gpt2-alpaca-355M

Evaluation results