Model Card for Model ID

This is phi-3-mini-4k-instruct ORPO finetuning for the italian language over the Alpaca vs. Alpaca italian dataset: efederici/alpaca-vs-alpaca-orpo-dpo

Model Details

Model Description

  • Developed by: Diego Giorgini
  • Funded by: AI Technologies SRL - www.aitechnologies.it
  • Language(s) (NLP): Italian
  • License: llama3
  • Finetuned from model: unsloth/Phi-3-mini-4k-instruct

Training Details

Environment

unsloth: 2024.5
torch: 2.2

Training Data

efederici/alpaca-vs-alpaca-orpo-dpo: The Alpaca vs. Alpaca dataset is a curated blend of the Alpaca dataset and the Alpaca GPT-4 dataset, both available on HuggingFace Datasets. It uses the standard GPT dataset as the 'rejected' answer, steering the model towards the GPT-4 answer, which is considered as the 'chosen' one.

Training Procedure

Preprocessing [optional]

  • No preprocessing has been performed, except for formatting with the phi-3 chat_template from unsloth:

    tokenizer = get_chat_template(tokenizer, chat_template = "phi-3")

Training Hyperparameters

  • Training regime: bf16

  • Model loading parameters:

max_seq_length = 8192
dtype = None
load_in_4bit = False
  • PEFT parameters:
r = 64  
lora_alpha = 64  
lora_dropout = 0  
bias = "none"  
random_state = 3407  
use_rslora = False  
loftq_config = None
  • ORPOConfig parameters:
max_length = 8192  
max_prompt_length = max_seq_length//2  
max_completion_length = max_seq_length//2  
warmup_ratio = 0.1  
weight_decay = 0.01  
per_device_train_batch_size = 1  
gradient_accumulation_steps = 16  
learning_rate=8e-6  
beta = 0.1  
optim = "paged_adamw_8bit"  
lr_scheduler_type = "linear"  
num_train_epochs = 1

Speeds, Sizes, Times

7h on an A100-40GB

Model Card Contact

diego.giorgini@icloud.com

Downloads last month
12
Safetensors
Model size
2.07B params
Tensor type
F32
·
BF16
·
U8
·
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train diegobit/Phi-3-mini-4k-instruct-ita-orpo-v2