gemma-2b-orpo / README.md
anakin87's picture
Update README.md
bf6bfe3 verified
metadata
license: other
license_name: gemma-terms-of-use
license_link: https://ai.google.dev/gemma/terms
library_name: transformers
base_model: google/gemma-2b
tags:
  - trl
  - orpo
  - generated_from_trainer
model-index:
  - name: gemma-2b-orpo
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 49.15
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 73.72
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 38.52
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 44.53
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 64.33
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 13.87
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=anakin87%2Fgemma-2b-orpo
          name: Open LLM Leaderboard
datasets:
  - alvarobartt/dpo-mix-7k-simplified
language:
  - en

gemma-2b-orpo

This is an ORPO fine-tune of google/gemma-2b with alvarobartt/dpo-mix-7k-simplified.

โšก Quantized version (GGUF): https://huggingface.co/anakin87/gemma-2b-orpo-GGUF

ORPO

ORPO (Odds Ratio Preference Optimization) is a new training paradigm that combines the usually separated phases of SFT (Supervised Fine-Tuning) and Preference Alignment (usually performed with RLHF or simpler methods like DPO).

  • Faster training
  • Less memory usage (no reference model needed)
  • Good results!

๐Ÿ† Evaluation

Nous

gemma-2b-orpo performs well for its size on Nous' benchmark suite.

(evaluation conducted using LLM AutoEval).

Model Average AGIEval GPT4All TruthfulQA Bigbench
anakin87/gemma-2b-orpo ๐Ÿ“„ 39.45 23.76 58.25 44.47 31.32
mlabonne/Gemmalpaca-2B ๐Ÿ“„ 38.39 24.48 51.22 47.02 30.85
google/gemma-2b-it ๐Ÿ“„ 36.1 23.76 43.6 47.64 29.41
google/gemma-2b ๐Ÿ“„ 34.26 22.7 43.35 39.96 31.03

Open LLM Leaderboard

Detailed results can be found here.

By comparison, on the Open LLM Leaderboard, google/gemma-2b-it has an average of 42.75.

Metric Value
Avg. 47.35
AI2 Reasoning Challenge (25-Shot) 49.15
HellaSwag (10-Shot) 73.72
MMLU (5-Shot) 38.52
TruthfulQA (0-shot) 44.53
Winogrande (5-shot) 64.33
GSM8k (5-shot) 13.87

๐Ÿ™ Dataset

alvarobartt/dpo-mix-7k-simplified is a simplified version of argilla/dpo-mix-7k. You can find more information in the dataset card.

๐ŸŽฎ Model in action

Usage notebook

๐Ÿ““ Chat and RAG using Haystack

Simple text generation with Transformers

The model is small, so it runs smoothly on Colab. It is also fine to load the model using quantization.

# pip install transformers accelerate
import torch
from transformers import pipeline
pipe = pipeline("text-generation", model="anakin87/gemma-2b-orpo", torch_dtype=torch.bfloat16, device_map="auto")
messages = [{"role": "user", "content": "Write a rap song on Vim vs VSCode."}]
prompt = pipe.tokenizer.apply_chat_template(messages, tokenize=False)
outputs = pipe(prompt, max_new_tokens=500, do_sample=True, temperature=0.7,  top_k=50, top_p=0.95)
print(outputs[0]["generated_text"])

Training

The model was trained using HF TRL. ๐Ÿ““ Training notebook

Framework versions

  • Transformers 4.39.1
  • Pytorch 2.2.0+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2