khoicrtp
/

test_model

Model card Files Files and versions Community

test_model / howto /finetune_full.md

khoicrtp

init

12001a9 over 1 year ago

preview code

raw

history blame

3.65 kB

	# Full Finetuning

	Full finetuning updates all layers in the pretrained LLaMA model. This regular finetuning procedure is typically considered as the baseline for parameter-efficient alternatives such as Low-Rank Adaptation (LoRA) or LLaMA-Adapter.

	The current [finetune_full.py](../scripts/finetune_full.py) we provide uses 4 A100 GPUs with a fully-sharded data parallel strategy to finetune Lit-LLaMA 7B on [Alpaca](https://github.com/tatsu-lab/stanford_alpaca) dataset. The A100 GPUs have 40 GB each, but it may require less memory to finetune this model.



	## Preparation

	The steps here only need to be done once:

	1. Follow the instructions in the [README](README.md) to install the dependencies.

	2. Download and convert the weights and save them in the `./checkpoints` folder as described [here](download_weights.md).

	4. Download the data and generate the Alpaca instruction tuning dataset:

	```bash
	python scripts/prepare_alpaca.py
	```

	or [prepare your own dataset](#tune-on-your-own-dataset).

	## Running the finetuning

	```bash
	python finetune_full.py
	```


	You can speed up training by setting the `devices` variable in the script to utilize more GPUs if available or increase the `batch_size`.
	Depending on the available GPU memory, you can also tune the `micro_batch_size` parameter to utilize the GPU efficiently.

	For example, the following settings will let you finetune the model in 32 hours using a fully-sharded data parallel strategy:
	```python
	devices = 4
	batch_size = 128 // devices
	micro_batch_size = 4
	```

	This script will save checkpoints periodically to the folder `out/`.

	> Note
	> All scripts support argument [customization](customize_paths.md)

	## Test the model

	You can test the finetuned model with your own instructions by running:

	```bash
	python generate_full.py \
	--prompt "Recommend a movie to watch on the weekend." \
	--quantize llm.int8
	```
	Output:
	```
	A good movie to watch on the weekend would be The Lion King, since it's a classic family film that everyone can enjoy...
	```
	If your GPU supports `bfloat16`, the script will automatically use it. Together with `--quantize llm.int8`, this brings the memory consumption down to ~8 GB.

	## Tune on your dataset

	With only a few modifications, you can prepare and train on your own instruction dataset.

	1. Create a json file in which each row holds one instruction-response pair.
	A row has an entry for 'instruction', 'input', and 'output', where 'input' is optional an can be
	the empty string if the instruction doesn't require a context. Below is an example json file:

	```
	[
	{
	"instruction": "Arrange the given numbers in ascending order.",
	"input": "2, 4, 0, 8, 3",
	"output": "0, 2, 3, 4, 8"
	},
	...
	]
	```

	2. Make a copy of `scripts/prepare_alpaca.py` and name it what you want:

	```bash
	cp scripts/prepare_alpaca.py scripts/prepare_mydata.py
	```

	3. Modify `scripts/prepare_mydata.py` to read the json data file.
	4. Run the script to generate the preprocessed, tokenized train-val split:

	```bash
	python scripts/prepare_mydata.py --destination_path data/mydata/
	```

	5. Run `finetune_full.py` by passing in the location of your data (and optionally other parameters):

	```bash
	python finetune_full.py --data_dir data/mydata/ --out_dir out/myexperiment
	```


	## Troubleshooting

	If you run into a CUDA error "Expected is_sm80 to be true, but got false", uncomment the line
	`torch.backends.cuda.enable_flash_sdp(False)` in the script below (see https://github.com/Lightning-AI/lit-llama/issues/101).