DavidGF commited on
Commit
ad44a2b
1 Parent(s): b25f714

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +4 -4
README.md CHANGED
@@ -61,14 +61,14 @@ Furthermore, we developed one iteration of the model using our entire SFT -Sauer
61
 
62
  After optimizing our primary SFT model, we applied a similar strategy to our new DPO Dataset, dividing it into further subsets. We trained one model on the entire dataset again and two more on these specialized subsets.
63
 
64
- Actively monitoring and intervening based on a decrease in perplexity on the gsm8k benchmark, led to an overall improvement in performance, especially in math abilities, without detracting from performance on other benchmarks—a task that is typically quite difficult.
65
 
66
  This process not only helps in understanding the effectiveness of Spherical Linear Interpolation but also introduces a new method for refining models with enhanced skills through a cycle of targeted data selection (Laser data(x)) + SLERP, followed by a subsequent focus on different data (Laser again on data(y)).
67
 
68
- Additionally, we integrated a novel training strategy on the SFT and DPO training process, where we partially freeze the model according to a laser-like analysis aiming to navigate and optimize the trade-offs highlighted by the no free lunch theorem. This innovative training method effectively prevents the significant problem of forgetting previously acquired knowledge.
69
- This aspect is particularly crucial when attempting to teach the model specific skills, such as a new language, where traditionally, the model might lose a considerable amount of its prior knowledge and exhibit a decline in overall intelligence.
70
 
71
- Concrete information on how the new training strategy works and the advantages it offers over conventional training methods will soon be published in a detailed paper by the LaserRMT research group.
72
 
73
 
74
  We improved the German language skills on this model. Nevertheless, certain formulations may occur that are not entirely correct.
 
61
 
62
  After optimizing our primary SFT model, we applied a similar strategy to our new DPO Dataset, dividing it into further subsets. We trained one model on the entire dataset again and two more on these specialized subsets.
63
 
64
+ We actively monitor and assesed the results of each training. Whenever we found a decrease in perplexity on the gsm8k benchmark we intervined. By following this procedure we were able to improve the overall performance, especially in math abilities, without detracting from performance on other benchmarks—a task that is, in general, quite difficult.
65
 
66
  This process not only helps in understanding the effectiveness of Spherical Linear Interpolation but also introduces a new method for refining models with enhanced skills through a cycle of targeted data selection (Laser data(x)) + SLERP, followed by a subsequent focus on different data (Laser again on data(y)).
67
 
68
+ Additionally, we integrated a novel training strategy on the SFT and DPO training process, where we partially freeze the model according to a laser-like analysis aiming to navigate and optimize the trade-offs highlighted by the no free lunch theorem. This innovative training method effectively prevents the significant problem of language models forgetting previously acquired knowledge.
69
+ This aspect is particularly crucial when attempting to teach the model specific skills, such as a new language, where in general, the model might lose a considerable amount of its prior knowledge and exhibit a decline in overall intelligence.
70
 
71
+ Detailed information on how the new training strategy works and the advantages it offers over conventional training methods will soon be published in a detailed paper by the LaserRMT research group.
72
 
73
 
74
  We improved the German language skills on this model. Nevertheless, certain formulations may occur that are not entirely correct.