MM4-3b
a llama based model i made thru extensive training and merging ill explain later i literally made so many models today
Title: Divergent Knowledge Enhancement through Retrograde Merging Strategies: Redefining Accuracy Perspectives in Language Model Evolution
Abstract: Have you picked up any bad habits, or have you ever learned to do something incorrectly, only to realize you must completly relearn whatever it is you're trying to accomplish? In this proposal, we present an innovative and unconventional approach to enhancing the performance and knowledge base of natural language models. Our proposed method, titled 'Divergent Knowledge Enhancement through Retrograde Merging Strategies' (DKE-RS), aims to challenge traditional practices in model development by incorporating a deliberate back-and-forth merger between high and low accuracy language models.
The initial conceptualization of DKE-RS stemmed from the realization that learning often encompasses both acquisition and unlearning, as encapsulated by the quote, "learning is just as sacred as unlearning." The proposed technique commences with a baseline model, 'blur-7b,' attaining an accuracy rate of 72.1%, subsequently merged with a Mistral fine-tuned model on the Dolphin dataset, only achieving a 46% accuracy level.
By deliberately merging with less accurate models and retracing the evolutionary process, DKE-RS aims to broaden the knowledge base of the resulting model. This strategy, dubbed 'making the bad good,' intentionally degrades the initial accuracy in an effort to refine it, thus breaking conventional iterative improvements for innovative progression.
image/png
The DKE-RS method challenges the status quo by not solely relying on a linear enhancement trajectory, instead adopting a more holistic and diverse approach. We anticipate that this non-linear merger process will further diversify the model's knowledge base, thereby creating a more resilient and well-rounded language generation tool, capable of handling complex contexts with a broader understanding.
Through thorough experimentation and analysis, we plan to assess the effectiveness and potential drawbacks of DKE-RS, comparing it to traditional merging techniques. The results from such evaluations will provide valuable insights into the efficacy of this divergent strategy in the landscape of natural language model development.
We posit that the Divergent Knowledge Enhancement through Retrograde Merging Strategies approach contributes a significant and compelling step forward in the field, provoking thought-provoking discourse about the nature of accuracy refinement and model progression.
Open LLM Leaderboard Evaluation Results
Detailed results can be found here
Metric | Value |
---|---|
Avg. | 53.22 |
AI2 Reasoning Challenge (25-Shot) | 44.80 |
HellaSwag (10-Shot) | 70.41 |
MMLU (5-Shot) | 50.90 |
TruthfulQA (0-shot) | 43.20 |
Winogrande (5-shot) | 66.22 |
GSM8k (5-shot) | 43.82 |
- Downloads last month
- 69
Datasets used to train liminerity/mm4-3b
Evaluation results
- normalized accuracy on AI2 Reasoning Challenge (25-Shot)test set Open LLM Leaderboard44.800
- normalized accuracy on HellaSwag (10-Shot)validation set Open LLM Leaderboard70.410
- accuracy on MMLU (5-Shot)test set Open LLM Leaderboard50.900
- mc2 on TruthfulQA (0-shot)validation set Open LLM Leaderboard43.200
- accuracy on Winogrande (5-shot)validation set Open LLM Leaderboard66.220
- accuracy on GSM8k (5-shot)test set Open LLM Leaderboard43.820