metadata
license: llama3.2
tags:
- llama-3
- orpo
- transformers
datasets:
- mlabonne/orpo-dpo-mix-40k
language:
- en
base_model:
- meta-llama/Llama-3.2-1B-Instruct
library_name: transformers
pipeline_tag: text-generation
model-index:
- name: week2-llama3-1B
results:
- task:
type: text-generation
dataset:
name: mlabonne/orpo-dpo-mix-40k
type: mlabonne/orpo-dpo-mix-40k
metrics:
- name: acc-norm (0-Shot)
type: acc-norm (0-Shot)
value: 0.6077
metrics:
- accuracy
Llama-3.2-1B-Instruct-ORPO
Evaluation Environmental Inpact
Model Details
This model was obtained by finetuning the open source Llama-3.2-1B-Instruct model on the mlabonne/orpo-dpo-mix-40k dataset, leveraging Odds Ratio Preference Optimization (ORPO) for Reinforcement Learning.
Uses
This model is optimized for general-purpose language tasks.
Evaluation
We used the Eulether test harness to evaluate the finetuned model. The table below presents a summary of the evaluation performed.
For a more granular evaluation on MMLU
, please see Section MMLU.
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
hellaswag | 1 | none | 0 | acc | ↑ | 0.4507 | ± | 0.0050 |
none | 0 | acc_norm | ↑ | 0.6077 | ± | 0.0049 | ||
arc_easy | 1 | none | 0 | acc | ↑ | 0.6856 | ± | 0.0095 |
none | 0 | acc_norm | ↑ | 0.6368 | ± | 0.0099 | ||
mmlu | 2 | none | acc | ↑ | 0.4597 | ± | 0.0041 | |
- humanities | 2 | none | acc | ↑ | 0.4434 | ± | 0.0071 | |
- other | 2 | none | acc | ↑ | 0.5163 | ± | 0.0088 | |
- social sciences | 2 | none | acc | ↑ | 0.5057 | ± | 0.0088 | |
- stem | 2 | none | acc | ↑ | 0.3834 | ± | 0.0085 |
MMLU
Tasks | Version | Filter | n-shot | Metric | Value | Stderr | ||
---|---|---|---|---|---|---|---|---|
mmlu | 2 | none | acc | ↑ | 0.4597 | ± | 0.0041 | |
- humanities | 2 | none | acc | ↑ | 0.4434 | ± | 0.0071 | |
- formal_logic | 1 | none | 0 | acc | ↑ | 0.3254 | ± | 0.0419 |
- high_school_european_history | 1 | none | 0 | acc | ↑ | 0.6182 | ± | 0.0379 |
- high_school_us_history | 1 | none | 0 | acc | ↑ | 0.5784 | ± | 0.0347 |
- high_school_world_history | 1 | none | 0 | acc | ↑ | 0.6540 | ± | 0.0310 |
- international_law | 1 | none | 0 | acc | ↑ | 0.6033 | ± | 0.0447 |
- jurisprudence | 1 | none | 0 | acc | ↑ | 0.5370 | ± | 0.0482 |
- logical_fallacies | 1 | none | 0 | acc | ↑ | 0.4479 | ± | 0.0391 |
- moral_disputes | 1 | none | 0 | acc | ↑ | 0.4711 | ± | 0.0269 |
- moral_scenarios | 1 | none | 0 | acc | ↑ | 0.3408 | ± | 0.0159 |
- philosophy | 1 | none | 0 | acc | ↑ | 0.5177 | ± | 0.0284 |
- prehistory | 1 | none | 0 | acc | ↑ | 0.5278 | ± | 0.0278 |
- professional_law | 1 | none | 0 | acc | ↑ | 0.3683 | ± | 0.0123 |
- world_religions | 1 | none | 0 | acc | ↑ | 0.5906 | ± | 0.0377 |
- other | 2 | none | acc | ↑ | 0.5163 | ± | 0.0088 | |
- business_ethics | 1 | none | 0 | acc | ↑ | 0.4300 | ± | 0.0498 |
- clinical_knowledge | 1 | none | 0 | acc | ↑ | 0.4642 | ± | 0.0307 |
- college_medicine | 1 | none | 0 | acc | ↑ | 0.3815 | ± | 0.0370 |
- global_facts | 1 | none | 0 | acc | ↑ | 0.3200 | ± | 0.0469 |
- human_aging | 1 | none | 0 | acc | ↑ | 0.5157 | ± | 0.0335 |
- management | 1 | none | 0 | acc | ↑ | 0.5243 | ± | 0.0494 |
- marketing | 1 | none | 0 | acc | ↑ | 0.6709 | ± | 0.0308 |
- medical_genetics | 1 | none | 0 | acc | ↑ | 0.4800 | ± | 0.0502 |
- miscellaneous | 1 | none | 0 | acc | ↑ | 0.6015 | ± | 0.0175 |
- nutrition | 1 | none | 0 | acc | ↑ | 0.5686 | ± | 0.0284 |
- professional_accounting | 1 | none | 0 | acc | ↑ | 0.3511 | ± | 0.0285 |
- professional_medicine | 1 | none | 0 | acc | ↑ | 0.5625 | ± | 0.0301 |
- virology | 1 | none | 0 | acc | ↑ | 0.4157 | ± | 0.0384 |
- social sciences | 2 | none | acc | ↑ | 0.5057 | ± | 0.0088 | |
- econometrics | 1 | none | 0 | acc | ↑ | 0.2456 | ± | 0.0405 |
- high_school_geography | 1 | none | 0 | acc | ↑ | 0.5606 | ± | 0.0354 |
- high_school_government_and_politics | 1 | none | 0 | acc | ↑ | 0.5389 | ± | 0.0360 |
- high_school_macroeconomics | 1 | none | 0 | acc | ↑ | 0.4128 | ± | 0.0250 |
- high_school_microeconomics | 1 | none | 0 | acc | ↑ | 0.4454 | ± | 0.0323 |
- high_school_psychology | 1 | none | 0 | acc | ↑ | 0.6183 | ± | 0.0208 |
- human_sexuality | 1 | none | 0 | acc | ↑ | 0.5420 | ± | 0.0437 |
- professional_psychology | 1 | none | 0 | acc | ↑ | 0.4167 | ± | 0.0199 |
- public_relations | 1 | none | 0 | acc | ↑ | 0.5000 | ± | 0.0479 |
- security_studies | 1 | none | 0 | acc | ↑ | 0.5265 | ± | 0.0320 |
- sociology | 1 | none | 0 | acc | ↑ | 0.6468 | ± | 0.0338 |
- us_foreign_policy | 1 | none | 0 | acc | ↑ | 0.6900 | ± | 0.0465 |
- stem | 2 | none | acc | ↑ | 0.3834 | ± | 0.0085 | |
- abstract_algebra | 1 | none | 0 | acc | ↑ | 0.2500 | ± | 0.0435 |
- anatomy | 1 | none | 0 | acc | ↑ | 0.4889 | ± | 0.0432 |
- astronomy | 1 | none | 0 | acc | ↑ | 0.5329 | ± | 0.0406 |
- college_biology | 1 | none | 0 | acc | ↑ | 0.4931 | ± | 0.0418 |
- college_chemistry | 1 | none | 0 | acc | ↑ | 0.3800 | ± | 0.0488 |
- college_computer_science | 1 | none | 0 | acc | ↑ | 0.3300 | ± | 0.0473 |
- college_mathematics | 1 | none | 0 | acc | ↑ | 0.2800 | ± | 0.0451 |
- college_physics | 1 | none | 0 | acc | ↑ | 0.2451 | ± | 0.0428 |
- computer_security | 1 | none | 0 | acc | ↑ | 0.4800 | ± | 0.0502 |
- conceptual_physics | 1 | none | 0 | acc | ↑ | 0.4383 | ± | 0.0324 |
- electrical_engineering | 1 | none | 0 | acc | ↑ | 0.5310 | ± | 0.0416 |
- elementary_mathematics | 1 | none | 0 | acc | ↑ | 0.2884 | ± | 0.0233 |
- high_school_biology | 1 | none | 0 | acc | ↑ | 0.4935 | ± | 0.0284 |
- high_school_chemistry | 1 | none | 0 | acc | ↑ | 0.3645 | ± | 0.0339 |
- high_school_computer_science | 1 | none | 0 | acc | ↑ | 0.4500 | ± | 0.0500 |
- high_school_mathematics | 1 | none | 0 | acc | ↑ | 0.2815 | ± | 0.0274 |
- high_school_physics | 1 | none | 0 | acc | ↑ | 0.3113 | ± | 0.0378 |
- high_school_statistics | 1 | none | 0 | acc | ↑ | 0.3657 | ± | 0.0328 |
- machine_learning | 1 | none | 0 | acc | ↑ | 0.2768 | ± | 0.0425 |
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: MacBook Air M1
- Hours used: 1
- Cloud Provider: GPC, A100
- Compute Region: US-EAST1
- Carbon Emitted: 0.09 kgCO2 of which 100 percents were directly offset by the cloud provider.