--- license: apache-2.0 base_model: mistralai/Mistral-7B-v0.1 tags: - alignment-handbook - trl - orpo - generated_from_trainer datasets: - argilla/dpo-mix-7k model-index: - name: mistral-orpo-mix-7k results: [] language: - en --- # mistral-orpo-mix-7k This model is a ORPO full fine-tuned version of [mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) on the argilla/dpo-mix-7k dataset with the [huggingface/alignment-handbook](https://github.com/huggingface/alignment-handbook). ## Training procedure Trained for 4.5 hours on 1xA100 ### Aligment Handbook recipe ```yaml # Model arguments model_name_or_path: mistralai/Mistral-7B-v0.1 model_revision: main torch_dtype: bfloat16 use_flash_attention_2: true trust_remote_code: true # Data training arguments chat_template: "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}" dataset_mixer: argilla/dpo-mix-7k: 1.0 dataset_splits: - train - test preprocessing_num_workers: 8 # ORPOTrainer arguments bf16: true beta: 0.05 gradient_accumulation_steps: 8 gradient_checkpointing: true gradient_checkpointing_kwargs: use_reentrant: true hub_model_id: mistral-orpo-mix-7k hub_private_repo: true learning_rate: 5.0e-6 log_level: info logging_steps: 10 lr_scheduler_type: inverse_sqrt max_length: 2048 max_prompt_length: 1792 num_train_epochs: 3 optim: adamw_bnb_8bit output_dir: data/mistral-orpo-mix-7k per_device_train_batch_size: 4 push_to_hub: true report_to: - tensorboard - wandb save_strategy: "no" seed: 42 warmup_steps: 100 ``` ### Framework versions - Transformers 4.41.0.dev0 - Pytorch 2.1.2 - Datasets 2.19.0 - Tokenizers 0.19.1