--- base_model: mistralai/Mixtral-8x7B-v0.1 tags: - alignment-handbook - generated_from_trainer datasets: - maxidl/instruct-en-de model-index: - name: Mixtral-8x7B-v0.1-Instruct-sft-en-de results: [] --- # Mixtral-8x7B-v0.1-Instruct-sft-en-de A full finetune of [mistralai/Mixtral-8x7B-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-v0.1) using a mix of English and German instruction data. ## Dataset |source|#examples| |---|---| teknium/OpenHermes-2.5 |1001551 maxidl/OpenOrca-gpt4-de |119559 maxidl/MathInstruct-de |56793 maxidl/Capybara-de |15991 maxidl/math-prm-800k-de |12298 maxidl/wikihow-de |10103 maxidl/no_robots-de |9500 maxidl/lima-de |1030 ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 1e-05 - train_batch_size: 1 - eval_batch_size: 8 - seed: 42 - distributed_type: multi-GPU - num_devices: 32 - total_train_batch_size: 32 - total_eval_batch_size: 256 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_steps: 50 - num_epochs: 3 ### Training results ### Framework versions - Transformers 4.39.3 - Pytorch 2.1.2 - Datasets 2.18.0 - Tokenizers 0.15.2