Edit model card

GPT-2 Instruction Tuning with a Single GPU πŸ˜‹

The model and all artifacts in this repository are obtained by full fine-tuning the base GPT-2 model on the Stanford Alpaca dataset with a single GPU with less than 8 GB of memory. The dataset was filtered using the max_length hyperparameter (maximum number of tokens per example), and gradient accumulation was applied to significantly reduce the memory footprint.

Results with the base model (124M parameters) are acceptable 😊, but further hyperparameters search and tricks could most probably lead to better results. Surely, it is an β€œaffordable” playground to play with this important step of the pipeline that transforms a model from an LLM to a usable and querable model.

1.0 Fine-Tuning Hyperparameters

Parameter Value
num_epochs 35
lr 1e-5
num_warmup_steps 300
weight_decay 0.0
batch_size 16
gradient_accumulation_steps 8
max_grad_norm 2.0
checkpoint_path 'modelstore'
logs_path 'logs'
max_length 140
eval_split 0.1
seed 9

2.0 Examples

Key Value
temperature 0.7
do_sample True
max_new_tokens 150
top_p 0.92
top_k 0

2.1 First example

Instruction:

What is the capital of France?

Response:

The capital of France is Paris.

2.2 Second example

Instruction:

Give three tips to improve my writing.

Response:

1. Break up the sentences into smaller, more manageable chunks.
2. Use regular and non-consecutive paragraphs.
3. Include a clear goal in your writing.

2.3 Third example

Instruction:

What is life?

Response:

Life is like a rollercoaster ride, filled with ups and downs and twists and turns. It is a journey full of highs and lows, but there are also rewards and punishments to be found along the way. Life can be unpredictable and unpredictable, so it is important to have a plan in place to stay on track and make the most of your opportunities.

3.0 References

Downloads last month
8
Safetensors
Model size
124M params
Tensor type
F32
Β·