This is a continuation of training of the Barely Regal 10.7B model using the following dataset: argilla/ultrafeedback-binarized-preferences-cleaned and Axolotl. Axolotl seemed to work a lot better for DPO training than Unsloth, as I kept running into template errors with Unsloth and the dataset.
The base of this model is Mistral Instruct 0.3 that has been supersized using task arithmetic to combine layers, when folding it in on itself. This new model I call Artificial Innovation - Mistral 3, which will show as AI-M3-10.7B as the base model on hub. In just my basic testing, this seems to have worked better than simple passthrough merging of layers, as the LLM has had less issues.
In addition to the layer merging, the model has been further fine tuned using SFT using Unsloth to act as a base for further training and experimentation with DPO or ORPO (current DPO project in the process of being trained using Axolotl.)
If you find the LLM is acting as if it has had a stroke, see if you have flash attn turned off and enable it if it is off. This seemed to correct any issues I had when running the model in LM Studio.
GGUFs are available here:
Q4_K_M: https://huggingface.co/sydonayrex/Barely-Regal-10.7B-DPO-Q4_K_M-GGUF
Q6_K: https://huggingface.co/sydonayrex/Barely-Regal-10.7B-DPO-Q6_K-GGUF
Q8: https://huggingface.co/sydonayrex/Barely-Regal-10.7B-DPO-Q8_0-GGUF
Uploaded model
- Developed by: sydonayrex
- License: apache-2.0
- Finetuned from model : sydonayrex/AI-M3-10.7Bv2
- Downloads last month
- 5
Model tree for sydonayrex/Barely-Regal-10.7B-DPO
Base model
mistralai/Mistral-7B-v0.3