Edit model card

Hercules-1.0-Mistral-7B

Hercules

Model description

Hercules-1.0-Mistral-7B is a fine-tune of the Mistral 7B model.

Designed to be a turbo-charged version of teknium's OpenHermes through augmented data sources. This model outperforms OpenHermes-7B by 8 points on average, and OpenHermes-13B by 4 points on average. I will continue working on this model to hopefully outperform OpenHermes 2.5.

This model is on par with OpenHermes 2.

Apart from higher performance over OpenHermes, this model has data and training transparency for reproducibility.

You can learn more about the Hercules dataset here: Locutusque/hercules-v1.0

During training, the dataset is split into a test set of 100 examples. At the end of training (120,000 examples), this model achieved a test loss of 0.57.

Training details

  • This model was trained on 8 kaggle TPUs, using torch xla SPMD for high MXU efficiency. There was no expense on my end (meaning you can reproduce this too!)
  • A learning rate of 2e-06 with the Adam optimizer. No LR scheduler was used. A low learning rate was used to prevent exploding gradients.
  • No mixed precision was used, with the default dtype being bfloat16.
  • Trained on both full subsets of OpenOrca, and 120,000 examples of Hercules. (If you wish to reproduce this model, you can limit redundancy by fine-tuning Open-Orca/Mistral-7B-OpenOrca on Hercules-v1.0)
  • No model parameters were frozen.
  • This model was trained on OpenAI's ChatML prompt format.

Inference examples

image/png image/png

image/png

image/png

image/png

Downloads last month
1,999
Safetensors
Model size
7.24B params
Tensor type
BF16
·

Finetuned from

Datasets used to train Locutusque/Hercules-1.0-Mistral-7B