Fine-tuned the synthia dataset on the hermes2 7b model

In my opinion it's probably the best model I fine-tuned in-terms of role-playing (tested on LavernAI)

Future plans:

  • I'll probably do more test in other areas

  • Will add other languages (Potentially japanese and chinese)

  • Finetune it on mistral models?

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 52.21
AI2 Reasoning Challenge (25-Shot) 51.02
HellaSwag (10-Shot) 79.12
MMLU (5-Shot) 47.88
TruthfulQA (0-shot) 46.77
Winogrande (5-shot) 74.51
GSM8k (5-shot) 13.95
6.74B params
