|
# Full Finetuning |
|
|
|
Full finetuning updates all layers in the pretrained LLaMA model. This *regular* finetuning procedure is typically considered as the baseline for parameter-efficient alternatives such as Low-Rank Adaptation (LoRA) or LLaMA-Adapter. |
|
|
|
The current [finetune_full.py](../scripts/finetune_full.py) we provide uses 4 A100 GPUs with a fully-sharded data parallel strategy to finetune Lit-LLaMA 7B on [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) dataset. The A100 GPUs have 40 GB each, but it may require less memory to finetune this model. |
|
|
|
|
|
|
|
## Preparation |
|
|
|
The steps here only need to be done once: |
|
|
|
1. Follow the instructions in the [README](README.md) to install the dependencies. |
|
|
|
2. Download and convert the weights and save them in the `./checkpoints` folder as described [here](download_weights.md). |
|
|
|
4. Download the data and generate the Alpaca instruction tuning dataset: |
|
|
|
```bash |
|
python scripts/prepare_alpaca.py |
|
``` |
|
|
|
or [prepare your own dataset](#tune-on-your-own-dataset). |
|
|
|
## Running the finetuning |
|
|
|
```bash |
|
python finetune_full.py |
|
``` |
|
|
|
|
|
You can speed up training by setting the `devices` variable in the script to utilize more GPUs if available or increase the `batch_size`. |
|
Depending on the available GPU memory, you can also tune the `micro_batch_size` parameter to utilize the GPU efficiently. |
|
|
|
For example, the following settings will let you finetune the model in 32 hours using a fully-sharded data parallel strategy: |
|
```python |
|
devices = 4 |
|
batch_size = 128 // devices |
|
micro_batch_size = 4 |
|
``` |
|
|
|
This script will save checkpoints periodically to the folder `out/`. |
|
|
|
> **Note** |
|
> All scripts support argument [customization](customize_paths.md) |
|
|
|
## Test the model |
|
|
|
You can test the finetuned model with your own instructions by running: |
|
|
|
```bash |
|
python generate_full.py \ |
|
--prompt "Recommend a movie to watch on the weekend." \ |
|
--quantize llm.int8 |
|
``` |
|
Output: |
|
``` |
|
A good movie to watch on the weekend would be The Lion King, since it's a classic family film that everyone can enjoy... |
|
``` |
|
If your GPU supports `bfloat16`, the script will automatically use it. Together with `--quantize llm.int8`, this brings the memory consumption down to ~8 GB. |
|
|
|
## Tune on your dataset |
|
|
|
With only a few modifications, you can prepare and train on your own instruction dataset. |
|
|
|
1. Create a json file in which each row holds one instruction-response pair. |
|
A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be |
|
the empty string if the instruction doesn't require a context. Below is an example json file: |
|
|
|
``` |
|
[ |
|
{ |
|
"instruction": "Arrange the given numbers in ascending order.", |
|
"input": "2, 4, 0, 8, 3", |
|
"output": "0, 2, 3, 4, 8" |
|
}, |
|
... |
|
] |
|
``` |
|
|
|
2. Make a copy of `scripts/prepare_alpaca.py` and name it what you want: |
|
|
|
```bash |
|
cp scripts/prepare_alpaca.py scripts/prepare_mydata.py |
|
``` |
|
|
|
3. Modify `scripts/prepare_mydata.py` to read the json data file. |
|
4. Run the script to generate the preprocessed, tokenized train-val split: |
|
|
|
```bash |
|
python scripts/prepare_mydata.py --destination_path data/mydata/ |
|
``` |
|
|
|
5. Run `finetune_full.py` by passing in the location of your data (and optionally other parameters): |
|
|
|
```bash |
|
python finetune_full.py --data_dir data/mydata/ --out_dir out/myexperiment |
|
``` |
|
|
|
|
|
## Troubleshooting |
|
|
|
If you run into a CUDA error "Expected is_sm80 to be true, but got false", uncomment the line |
|
`torch.backends.cuda.enable_flash_sdp(False)` in the script below (see https://github.com/Lightning-AI/lit-llama/issues/101). |
|
|