"A Closer Look at the Limitations of Instruction Tuning" is a new paper that explores the efficacy and limitations of Instruction Tuning (IT) in Large Language Models (LLMs) for conversational agents. The authors conduct a series of experiments using both LoRA fine-tuning (LFT) and standard full-parameter fine-tuning (SFT) across various LLMs and IT datasets.
The key findings are: * LoRA fine-tuning (LFT) preserves the pre-training token distribution while SFT doesn't. This indicates that using LFT, post fine-tuning the model still heavily relies on the pre-training and doesn't acquire new information. * Dataset scaling is ineffective for LFT - experiments show that scaling the dataset size 52x or even 326x doesn't improve the performance. * LoRA fine-tuning mainly enhances response initiation and style without substantial knowledge enhancement. * Full-parameter fine-tuning tends to degrade LLM knowledge base and increase hallucination occurrences. * Popular other methods and adjustments fail to significantly outperform simple LoRA fine-tuned models in terms of conversational quality and accuracy.
Congrats to the authors @Sreyan88 and others for their work!
Why? Best closed chat models are built on top of multi-turn dialogue preference data. The OSS community lacks these datasets. This dataset is the first in the series to close this gap.
Is this dataset useful? To test this dataset, we've built our virtual launching partner:
๐ Welcome CapybaraHermes, a preference tuned OpenHermes with increased second turn capabilities on MTBench
As usual, models are the least important to us. We like to focus on the data. Our mission is to build and share high-quality datasets, sharing our methods in the open so the community can improve upon them.
That's why, we took some time to describe the full methodology on the dataset card, check it out and give us feedback! Data and methods are never perfect!
Finally, this is just a preview version and would love to collaborate with you to add more benchmarking results, what hyperparams work for DPO'ing models, what mix of datasets, etc.
Expect some more datasets in the coming weeks. Let's build the best data for AI, together.