GPT-2 Instruction Tuning with a Single GPU 😋

The model and all artifacts in this repository are obtained by full fine-tuning the base GPT-2 model on the Stanford Alpaca dataset with a single GPU with less than 8 GB of memory. The dataset was filtered using the max_length hyperparameter (maximum number of tokens per example), and gradient accumulation was applied to significantly reduce the memory footprint.

Results with the base model (124M parameters) are acceptable 😊, but further hyperparameters search and tricks could most probably lead to better results. Surely, it is an “affordable” playground to play with this important step of the pipeline that transforms a model from an LLM to a usable and querable model.

1.0 Fine-Tuning Hyperparameters

Parameter	Value
num_epochs	35
lr	1e-5
num_warmup_steps	300
weight_decay	0.0
batch_size	16
gradient_accumulation_steps	8
max_grad_norm	2.0
checkpoint_path	'modelstore'
logs_path	'logs'
max_length	140
eval_split	0.1
seed	9

2.0 Examples

Key	Value
temperature	0.7
do_sample	True
max_new_tokens	150
top_p	0.92
top_k	0

2.1 First example

Instruction:

What is the capital of France?

Response:

The capital of France is Paris.

2.2 Second example

Instruction:

Give three tips to improve my writing.

Response:

1. Break up the sentences into smaller, more manageable chunks.
2. Use regular and non-consecutive paragraphs.
3. Include a clear goal in your writing.

2.3 Third example

Instruction:

What is life?

Response:

Life is like a rollercoaster ride, filled with ups and downs and twists and turns. It is a journey full of highs and lows, but there are also rewards and punishments to be found along the way. Life can be unpredictable and unpredictable, so it is important to have a plan in place to stay on track and make the most of your opportunities.

3.0 References

🚀 Dive into the Code: GitHub Repository
📚 Explore the Dataset: Stanford Alpaca Dataset