ramonactruta's picture
Update README.md
51e3074 verified
metadata
license: llama3.2
tags:
  - llama-3
  - orpo
  - transformers
datasets:
  - mlabonne/orpo-dpo-mix-40k
language:
  - en
base_model:
  - meta-llama/Llama-3.2-1B-Instruct
library_name: transformers
pipeline_tag: text-generation
model-index:
  - name: week2-llama3-1B
    results:
      - task:
          type: text-generation
        dataset:
          name: mlabonne/orpo-dpo-mix-40k
          type: mlabonne/orpo-dpo-mix-40k
        metrics:
          - name: acc-norm (0-Shot)
            type: acc-norm (0-Shot)
            value: 0.6077
metrics:
  - accuracy

Llama-3.2-1B-Instruct-ORPO

Evaluation Environmental Inpact

Model Details

This model was obtained by finetuning the open source Llama-3.2-1B-Instruct model on the mlabonne/orpo-dpo-mix-40k dataset, leveraging Odds Ratio Preference Optimization (ORPO) for Reinforcement Learning.

Uses

This model is optimized for general-purpose language tasks.

Evaluation

We used the Eulether test harness to evaluate the finetuned model. The table below presents a summary of the evaluation performed.

For a more granular evaluation on MMLU, please see Section MMLU.

Tasks Version Filter n-shot Metric Value Stderr
hellaswag 1 none 0 acc 0.4507 ± 0.0050
none 0 acc_norm 0.6077 ± 0.0049
arc_easy 1 none 0 acc 0.6856 ± 0.0095
none 0 acc_norm 0.6368 ± 0.0099
mmlu 2 none acc 0.4597 ± 0.0041
- humanities 2 none acc 0.4434 ± 0.0071
- other 2 none acc 0.5163 ± 0.0088
- social sciences 2 none acc 0.5057 ± 0.0088
- stem 2 none acc 0.3834 ± 0.0085

Top

MMLU

Tasks Version Filter n-shot Metric Value Stderr
mmlu 2 none acc 0.4597 ± 0.0041
- humanities 2 none acc 0.4434 ± 0.0071
- formal_logic 1 none 0 acc 0.3254 ± 0.0419
- high_school_european_history 1 none 0 acc 0.6182 ± 0.0379
- high_school_us_history 1 none 0 acc 0.5784 ± 0.0347
- high_school_world_history 1 none 0 acc 0.6540 ± 0.0310
- international_law 1 none 0 acc 0.6033 ± 0.0447
- jurisprudence 1 none 0 acc 0.5370 ± 0.0482
- logical_fallacies 1 none 0 acc 0.4479 ± 0.0391
- moral_disputes 1 none 0 acc 0.4711 ± 0.0269
- moral_scenarios 1 none 0 acc 0.3408 ± 0.0159
- philosophy 1 none 0 acc 0.5177 ± 0.0284
- prehistory 1 none 0 acc 0.5278 ± 0.0278
- professional_law 1 none 0 acc 0.3683 ± 0.0123
- world_religions 1 none 0 acc 0.5906 ± 0.0377
- other 2 none acc 0.5163 ± 0.0088
- business_ethics 1 none 0 acc 0.4300 ± 0.0498
- clinical_knowledge 1 none 0 acc 0.4642 ± 0.0307
- college_medicine 1 none 0 acc 0.3815 ± 0.0370
- global_facts 1 none 0 acc 0.3200 ± 0.0469
- human_aging 1 none 0 acc 0.5157 ± 0.0335
- management 1 none 0 acc 0.5243 ± 0.0494
- marketing 1 none 0 acc 0.6709 ± 0.0308
- medical_genetics 1 none 0 acc 0.4800 ± 0.0502
- miscellaneous 1 none 0 acc 0.6015 ± 0.0175
- nutrition 1 none 0 acc 0.5686 ± 0.0284
- professional_accounting 1 none 0 acc 0.3511 ± 0.0285
- professional_medicine 1 none 0 acc 0.5625 ± 0.0301
- virology 1 none 0 acc 0.4157 ± 0.0384
- social sciences 2 none acc 0.5057 ± 0.0088
- econometrics 1 none 0 acc 0.2456 ± 0.0405
- high_school_geography 1 none 0 acc 0.5606 ± 0.0354
- high_school_government_and_politics 1 none 0 acc 0.5389 ± 0.0360
- high_school_macroeconomics 1 none 0 acc 0.4128 ± 0.0250
- high_school_microeconomics 1 none 0 acc 0.4454 ± 0.0323
- high_school_psychology 1 none 0 acc 0.6183 ± 0.0208
- human_sexuality 1 none 0 acc 0.5420 ± 0.0437
- professional_psychology 1 none 0 acc 0.4167 ± 0.0199
- public_relations 1 none 0 acc 0.5000 ± 0.0479
- security_studies 1 none 0 acc 0.5265 ± 0.0320
- sociology 1 none 0 acc 0.6468 ± 0.0338
- us_foreign_policy 1 none 0 acc 0.6900 ± 0.0465
- stem 2 none acc 0.3834 ± 0.0085
- abstract_algebra 1 none 0 acc 0.2500 ± 0.0435
- anatomy 1 none 0 acc 0.4889 ± 0.0432
- astronomy 1 none 0 acc 0.5329 ± 0.0406
- college_biology 1 none 0 acc 0.4931 ± 0.0418
- college_chemistry 1 none 0 acc 0.3800 ± 0.0488
- college_computer_science 1 none 0 acc 0.3300 ± 0.0473
- college_mathematics 1 none 0 acc 0.2800 ± 0.0451
- college_physics 1 none 0 acc 0.2451 ± 0.0428
- computer_security 1 none 0 acc 0.4800 ± 0.0502
- conceptual_physics 1 none 0 acc 0.4383 ± 0.0324
- electrical_engineering 1 none 0 acc 0.5310 ± 0.0416
- elementary_mathematics 1 none 0 acc 0.2884 ± 0.0233
- high_school_biology 1 none 0 acc 0.4935 ± 0.0284
- high_school_chemistry 1 none 0 acc 0.3645 ± 0.0339
- high_school_computer_science 1 none 0 acc 0.4500 ± 0.0500
- high_school_mathematics 1 none 0 acc 0.2815 ± 0.0274
- high_school_physics 1 none 0 acc 0.3113 ± 0.0378
- high_school_statistics 1 none 0 acc 0.3657 ± 0.0328
- machine_learning 1 none 0 acc 0.2768 ± 0.0425

Top

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

  • Hardware Type: MacBook Air M1
  • Hours used: 1
  • Cloud Provider: GPC, A100
  • Compute Region: US-EAST1
  • Carbon Emitted: 0.09 kgCO2 of which 100 percents were directly offset by the cloud provider.

Top