DavidGF commited on
Commit
acacd61
1 Parent(s): c9a5e49

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -58,7 +58,7 @@ Without their independent research collaboration this model release would not ha
58
  Anyone who has attempted or succeeded in fine-tuning a model is aware of the difficulty in nudging it towards a specific skill, such as mastering new languages, as well as the challenges associated with achieving significant improvements in performance. Experimenting with a novel training strategy and Spherical Linear Interpolation alongside a lasered version of the model itself has proven to be both fascinating and revealing.
59
  Furthermore, we developed one iteration of the model using our entire SFT -Sauerkraut dataset and two additional iterations using subsets of the full dataset—one focused on enhancing MMLU and TQA capabilities, and the other on boosting GSM8K and Winogrande skills.
60
  After optimizing our primary SFT model, we applied a similar strategy to our new DPO Dataset, dividing it into further subsets. We trained one model on the entire dataset again and two more on these specialized subsets. Actively monitoring and intervening based on a decrease in perplexity on the gsm8k benchmark, led to an overall improvement in performance, especially in math abilities, without detracting from performance on other benchmarks—a task that is typically quite difficult. This process not only helps in understanding the effectiveness of Spherical Linear Interpolation but also introduces a new method for refining models with enhanced skills through a cycle of targeted data selection (Laser data(x)) + SLERP, followed by a subsequent focus on different data (Laser again on data(y)).
61
- Additionally, we integrated a novel training strategy on the SFT and DPO training process inspired by the LaserRMT approach, aiming to navigate and optimize the trade-offs highlighted by the no free lunch theorem and to mitigate the severe issue of forgetting previously learned information. This innovative training method is not only more efficient, requiring less VRAM, but it also effectively prevents the significant problem of forgetting previously acquired knowledge. This aspect is particularly crucial when attempting to teach the model specific skills, such as a new language, where traditionally, the model might lose a considerable amount of its prior knowledge and exhibit a decline in overall intelligence. Concrete information on how the new training strategy works and the advantages it offers over conventional training methods will soon be published in a detailed paper by the LaserRMT research group.
62
 
63
 
64
  We improved the German language skills on this model. Nevertheless, certain formulations may occur that are not entirely correct.
 
58
  Anyone who has attempted or succeeded in fine-tuning a model is aware of the difficulty in nudging it towards a specific skill, such as mastering new languages, as well as the challenges associated with achieving significant improvements in performance. Experimenting with a novel training strategy and Spherical Linear Interpolation alongside a lasered version of the model itself has proven to be both fascinating and revealing.
59
  Furthermore, we developed one iteration of the model using our entire SFT -Sauerkraut dataset and two additional iterations using subsets of the full dataset—one focused on enhancing MMLU and TQA capabilities, and the other on boosting GSM8K and Winogrande skills.
60
  After optimizing our primary SFT model, we applied a similar strategy to our new DPO Dataset, dividing it into further subsets. We trained one model on the entire dataset again and two more on these specialized subsets. Actively monitoring and intervening based on a decrease in perplexity on the gsm8k benchmark, led to an overall improvement in performance, especially in math abilities, without detracting from performance on other benchmarks—a task that is typically quite difficult. This process not only helps in understanding the effectiveness of Spherical Linear Interpolation but also introduces a new method for refining models with enhanced skills through a cycle of targeted data selection (Laser data(x)) + SLERP, followed by a subsequent focus on different data (Laser again on data(y)).
61
+ Additionally, we integrated a novel training strategy on the SFT and DPO training process inspired by the LaserRMT approach, were we partially freeze the model according to a laser-like analysis aiming to navigate and optimize the trade-offs highlighted by the no free lunch theorem. This innovative training method effectively prevents the significant problem of forgetting previously acquired knowledge. This aspect is particularly crucial when attempting to teach the model specific skills, such as a new language, where traditionally, the model might lose a considerable amount of its prior knowledge and exhibit a decline in overall intelligence. Concrete information on how the new training strategy works and the advantages it offers over conventional training methods will soon be published in a detailed paper by the LaserRMT research group.
62
 
63
 
64
  We improved the German language skills on this model. Nevertheless, certain formulations may occur that are not entirely correct.