mistral-orpo-capybara-3k
This model is a full fine-tuned version of mistralai/Mistral-7B-v0.1 with ORPO on the eduagarcia/capybara-dpo-3k dataset with the huggingface/alignment-handbook.
Model description
Trained for 4.5 hours on 1xA100
Aligment Handbook recipe
# Model arguments
model_name_or_path: mistralai/Mistral-7B-v0.1
model_revision: main
torch_dtype: bfloat16
use_flash_attention_2: true
trust_remote_code: true
# Data training arguments
chat_template: "{% for message in messages %}\n{% if message['role'] == 'user' %}\n{{ '<|user|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'system' %}\n{{ '<|system|>\n' + message['content'] + eos_token }}\n{% elif message['role'] == 'assistant' %}\n{{ '<|assistant|>\n' + message['content'] + eos_token }}\n{% endif %}\n{% if loop.last and add_generation_prompt %}\n{{ '<|assistant|>' }}\n{% endif %}\n{% endfor %}"
dataset_mixer:
eduagarcia/capybara-dpo-3k: 1.0
dataset_splits:
- train
- test
preprocessing_num_workers: 8
# ORPOTrainer arguments
bf16: true
beta: 0.05
gradient_accumulation_steps: 8
gradient_checkpointing: true
gradient_checkpointing_kwargs:
use_reentrant: true
hub_model_id: mistral-orpo-capybara-3k
learning_rate: 5.0e-6
log_level: info
logging_steps: 10
lr_scheduler_type: inverse_sqrt
max_length: 2048
max_prompt_length: 1792
num_train_epochs: 1
optim: adamw_bnb_8bit
output_dir: data/mistral-orpo-capybara-3k
per_device_train_batch_size: 4
push_to_hub: true
report_to:
- tensorboard
- wandb
save_strategy: "no"
seed: 42
warmup_steps: 100
Framework versions
- Transformers 4.41.0.dev0
- Pytorch 2.1.2
- Datasets 2.19.0
- Tokenizers 0.19.1
- Downloads last month
- 1,879