|
|
|
# Instruction tuning and weight averaging |
|
|
|
Note that some of these stpes may be out of date, but the general flow should remain. |
|
|
|
We downloaded the data from https://huggingface.co/datasets/timdettmers/openassistant-guanaco then ran `python datapreprocess/make_assistant_data.py --input-files /fsx/home-mitchellw/openassistant_best_replies_train.jsonl --output-dir /fsx/home-mitchellw/tmp --num-workers 1 --num-consumers 1`. Note that we changed shard size so there would be at least 8 shards. |
|
|
|
``` |
|
torchrun --nproc-per-node 8 -m open_lm.main \ |
|
--train-data "pipe:aws s3 cp s3://<bucket>/lmdata/assistant_data/train/shard-{0000000..0000008}.tar -" \ |
|
--train-num-samples 4382720 \ |
|
--workers 1 \ |
|
--precision amp_bfloat16 \ |
|
--batch-size 8 \ |
|
--grad-checkpointing \ |
|
--log-every-n-steps 1 \ |
|
--grad-clip-norm 1 \ |
|
--lr 2e-5 \ |
|
--model g3b_neox \ |
|
--fsdp --fsdp-amp \ |
|
--warmup 100 \ |
|
--wd 0.1 \ |
|
--beta2 0.95 \ |
|
--epochs 6 \ |
|
--disable-buffer \ |
|
--lr-cooldown-end 5e-6 \ |
|
--report-to wandb \ |
|
--wandb-project-name lmtune \ |
|
--pretrained /fsx/home-mitchellw/experimetns/lm/1p5T-bigdata-neox-g3b_neox-10-1e-3-0.1-nodes48-bs10-v0/checkpoints/epoch_24.pt \ |
|
--name instruction-tune-3b-2e-5-6 \ |
|
--logs /fsx/home-mitchellw/experimetns/lmtune |
|
``` |
|
|
|
Now we want to interpolate between the base and fine-tuned model with different coefficients alpha. We can do so with this bash script. |
|
|
|
``` |
|
BASEMODEL=/fsx/home-mitchellw/experimetns/lm/1p5T-bigdata-neox-g3b_neox-10-1e-3-0.1-nodes48-bs10-v0/checkpoints/epoch_24.pt |
|
FINALMODEL=/fsx/home-mitchellw/experimetns/lmtune/instruction-tune-3b-2e-5-6/checkpoints/epoch_6.pt |
|
MODEL=g3b_neox |
|
|
|
for alpha in $(seq 0 0.05 1) |
|
do |
|
|
|
#echo $model |
|
save_path_1="$(dirname $FINALMODEL)/chat-eval-interpolate-$alpha-$(basename $FINALMODEL)" |
|
save_path_2="$(dirname $FINALMODEL)/base-eval-interpolate-$alpha-$(basename $FINALMODEL)" |
|
|
|
echo $save_path_1 |
|
echo $save_path_2 |
|
|
|
|
|
if [ -f "$save_path_1" ]; then |
|
echo "$save_path_1 exists." |
|
else |
|
# first do the chat eval. |
|
torchrun --nproc-per-node 4 -m open_lm.main \ |
|
--val-data "pipe:aws s3 cp s3://<bucket>/lmdata/assistant_data/val.tar -" \ |
|
--workers 6 \ |
|
--precision amp_bfloat16 \ |
|
--batch-size 8 \ |
|
--grad-checkpointing \ |
|
--log-every-n-steps 1 \ |
|
--model $MODEL \ |
|
--fsdp --fsdp-amp \ |
|
--train-num-samples 1000000000 \ |
|
--name $RANDOM \ |
|
--average $BASEMODEL $FINALMODEL \ |
|
--average-coefficients $alpha $(echo "1-$alpha" | bc -l) \ |
|
--logs /fsx/home-mitchellw/experimetns/lmdebug > $save_path_1 |
|
|
|
# now do the base eval |
|
torchrun --nproc-per-node 4 -m open_lm.main \ |
|
--val-data "pipe:aws s3 cp s3://<bucket>/lmdata/validation_data_tokenized/open_lm//shard_00000000.tar -" \ |
|
--workers 6 \ |
|
--precision amp_bfloat16 \ |
|
--batch-size 8 \ |
|
--grad-checkpointing \ |
|
--log-every-n-steps 1 \ |
|
--model $MODEL \ |
|
--data-key json \ |
|
--fsdp --fsdp-amp \ |
|
--train-num-samples 1000000000 \ |
|
--name $RANDOM \ |
|
--average $BASEMODEL $FINALMODEL \ |
|
--average-coefficients $alpha $(echo "1-$alpha" | bc -l) \ |
|
--logs /fsx/home-mitchellw/experimetns/lmdebug > $save_path_2 |
|
fi |
|
done |
|
``` |
|
|
|
Then you can make a plot with `python plots/interpolation.py` which results in the following plot. |
|
|
|
![](plots/interpolation.png) |