Tim Dolan


AI & ML interests

None yet


Posts 4

view post
Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

UCLA-AGI has proven that large language models, even weaker large language models, can improve themselves with data only produced by original model. The question they answer in their paper is:

"Can we empower a weak LLM to improve itself without acquiring additional human annotated data?"

They answer this question by the proposal and testing of a novel fine-tuning method they call Self-Play fIne-tuNing (SPIN). The process starts by applying a supervised fine-tune (SFT) to zephyr-7b using all 200k samples of HuggingfaceH4/ultrachat_200k to eliminate the need for a human annotator.

Once the model has completed SFT, the SPIN method suggests generating 50k samples of synthetic data pairs of 'chosen' and 'rejected' samples. The model will be fine tuned on those generations, and this process will repeat for another 3 iterations for a total 200k samples.

This experiment is unique because they propose that their method can yield upwards of 10% performance gains without using any additional human annotated data. The strategy was designed to improve the less strong language models, but with further experimentation could be a formidable strategy for improving language models.

If you would like to explore this strategy for yourself, here are some resources:
Colab: https://colab.research.google.com/drive/1IjDeNVBsRru2-hM_9aauD6gVI-VvJnpk?usp=sharing
Github: https://github.com/uclaml/SPIN
The product of the experiment: UCLA-AGI/zephyr-7b-sft-full-SPIN-iter3
Paper: 2401.01335
view post
Fine-tune 7B models on free-tier Colab hardware using Unsloth 🦥

Unsloth is a framework for fine tuning language models boasting a 0% loss in accuracy while using no approximation methods. They offer a trainer for both supervised fine-tuning (SFT) and direct preference optimization (DPO) that can increase speed of fine-tuning by up to 5x.
This is achieved by adding LoRa adapters. This way they only need to train 1 to 10% of the total parameters. You can export a LoRa adapter or merge to 16-bit for a full finetune. The resulting model is prepared for use in vLLM for faster inference.

Additionally, Huggingface has integrated Unsloth into the documentation for DPO training and reported 18.6% performance gains on T4.

This sets a new standard for fine-tuning large language models. If you would like to explore this methodology for yourself I have provided a notebook "AutoSloth," where you can fine tune using either SFT or DPO and it will upload to HF with a prefilled Unsloth README 🦥 and a Q8_0 quantization.

The SFT example is set up for free tier usage, but the DPO example is set up for an A100. The DPO example can be altered to work on T4 but I wanted to include more than one example.

Colab Stats during training:
+ Model: unsloth/mistral-7b-bnb-4bit
+ Dataset: yahma/alpaca-cleaned
+ Batch size: 2
+ Gradient steps: 4
+ System RAM: 8.5 / 51.0 GB
+ VRAM (T4): 13.6 / 15.0 GB

🦥Unsloth: https://github.com/unslothai/unsloth
🦥AutoSloth: https://colab.research.google.com/drive/1Zo0sVEb2lqdsUm9dy2PTzGySxdF9CNkc?usp=sharing
🤗HF-Unsloth-docs: https://huggingface.co/docs/trl/main/en/dpo_trainer#accelerate-dpo-fine-tuning-using-unsloth
🤗HF-Unsloth Blog Post: https://huggingface.co/blog/unsloth-trl