Edit model card

tinyllama-moe-base-mix-orpo

This model is a fine-tuned version of four-two-labs/tinyllama-moe-base on the see code dataset. It achieves the following results on the evaluation set:

  • Loss: 1.1757
  • Rewards/chosen: -0.1708
  • Rewards/rejected: -0.1682
  • Rewards/accuracies: 0.5003
  • Rewards/margins: -0.0027
  • Logps/rejected: -1.6818
  • Logps/chosen: -1.7085
  • Logits/rejected: -2.6114
  • Logits/chosen: -2.6139
  • Nll Loss: 1.0859
  • Log Odds Ratio: -0.8989
  • Log Odds Chosen: -0.1073

Model description

More information needed

Training and evaluation data

from datasets import load_dataset
from datasets import interleave_datasets


def format_chat_template(row):
    for key in ['prompt', 'chosen', 'rejected']: 
        row[key] = tokenizer.apply_chat_template(row[key], tokenize=False)
        
    return row


dataset = (
    interleave_datasets([
        (
            interleave_datasets(
                load_dataset(
                    'four-two-labs/orpo-dpo-mix-40k-multilang-fixed',                 
                    token=hf_token,
                )
                .values()
            )
            .select_columns(['prompt', 'chosen', 'rejected'])
        ),
        (
            load_dataset(
                'four-two-labs/translations-5M-DPO',
                split='train', 
                token=hf_token,                
            )
            .shuffle(42)
            .select(range(250_000))
            .select_columns(['prompt', 'chosen', 'rejected'])
        ),
        (
            load_dataset(
                'four-two-labs/ultrafeedback_binarized-fixed', 
                split='train_prefs', 
                token=hf_token,
            )
            .select_columns(['prompt', 'chosen', 'rejected'])
        ),        
    ])
    .shuffle(seed=42)
    #.select(range(1000))
    .map(format_chat_template, num_proc=32)    
    .train_test_split(test_size=0.01)
)

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 1e-06
  • train_batch_size: 12
  • eval_batch_size: 12
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 48
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 10
  • num_epochs: 3

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen Nll Loss Log Odds Ratio Log Odds Chosen
1.1835 0.6001 2270 1.1764 -0.1711 -0.1684 0.4976 -0.0027 -1.6838 -1.7106 -2.6416 -2.6430 1.0866 -0.8991 -0.1073
1.1494 1.2002 4540 1.1757 -0.1709 -0.1682 0.5003 -0.0026 -1.6820 -1.7085 -2.5529 -2.5573 1.0860 -0.8988 -0.1070
1.233 1.8003 6810 1.1757 -0.1709 -0.1682 0.4993 -0.0027 -1.6819 -1.7086 -2.5859 -2.5892 1.0859 -0.8989 -0.1073
1.2344 2.4004 9080 1.1757 -0.1708 -0.1682 0.5003 -0.0027 -1.6818 -1.7085 -2.6114 -2.6139 1.0859 -0.8989 -0.1073

Framework versions

  • Transformers 4.40.0
  • Pytorch 2.1.2+cu121
  • Datasets 2.19.0
  • Tokenizers 0.19.1
Downloads last month
3
Safetensors
Model size
3.38B params
Tensor type
BF16
·
Invalid base_model specified in model card metadata. Needs to be a model id from hf.co/models.