metadata

license: llama3.2
tags:
  - llama-3
  - orpo
  - transformers
datasets:
  - mlabonne/orpo-dpo-mix-40k
language:
  - en
base_model:
  - meta-llama/Llama-3.2-1B-Instruct
library_name: transformers
pipeline_tag: text-generation
model-index:
  - name: week2-llama3-1B
    results:
      - task:
          type: text-generation
        dataset:
          name: mlabonne/orpo-dpo-mix-40k
          type: mlabonne/orpo-dpo-mix-40k
        metrics:
          - name: acc-norm (0-Shot)
            type: acc-norm (0-Shot)
            value: 0.6077
metrics:
  - accuracy

Llama-3.2-1B-Instruct-ORPO

Evaluation Environmental Inpact

Model Details

This model was obtained by finetuning the open source Llama-3.2-1B-Instruct model on the mlabonne/orpo-dpo-mix-40k dataset, leveraging Odds Ratio Preference Optimization (ORPO) for Reinforcement Learning.

Uses

This model is optimized for general-purpose language tasks.

Evaluation

We used the Eulether test harness to evaluate the finetuned model. The table below presents a summary of the evaluation performed.

For a more granular evaluation on MMLU, please see Section MMLU.

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
hellaswag	1	none	0	acc	↑	0.4507	±	0.0050
		none	0	acc_norm	↑	0.6077	±	0.0049
arc_easy	1	none	0	acc	↑	0.6856	±	0.0095
		none	0	acc_norm	↑	0.6368	±	0.0099
mmlu	2	none		acc	↑	0.4597	±	0.0041
- humanities	2	none		acc	↑	0.4434	±	0.0071
- other	2	none		acc	↑	0.5163	±	0.0088
- social sciences	2	none		acc	↑	0.5057	±	0.0088
- stem	2	none		acc	↑	0.3834	±	0.0085

Top

MMLU

Tasks	Version	Filter	n-shot	Metric		Value		Stderr
mmlu	2	none		acc	↑	0.4597	±	0.0041
- humanities	2	none		acc	↑	0.4434	±	0.0071
- formal_logic	1	none	0	acc	↑	0.3254	±	0.0419
- high_school_european_history	1	none	0	acc	↑	0.6182	±	0.0379
- high_school_us_history	1	none	0	acc	↑	0.5784	±	0.0347
- high_school_world_history	1	none	0	acc	↑	0.6540	±	0.0310
- international_law	1	none	0	acc	↑	0.6033	±	0.0447
- jurisprudence	1	none	0	acc	↑	0.5370	±	0.0482
- logical_fallacies	1	none	0	acc	↑	0.4479	±	0.0391
- moral_disputes	1	none	0	acc	↑	0.4711	±	0.0269
- moral_scenarios	1	none	0	acc	↑	0.3408	±	0.0159
- philosophy	1	none	0	acc	↑	0.5177	±	0.0284
- prehistory	1	none	0	acc	↑	0.5278	±	0.0278
- professional_law	1	none	0	acc	↑	0.3683	±	0.0123
- world_religions	1	none	0	acc	↑	0.5906	±	0.0377
- other	2	none		acc	↑	0.5163	±	0.0088
- business_ethics	1	none	0	acc	↑	0.4300	±	0.0498
- clinical_knowledge	1	none	0	acc	↑	0.4642	±	0.0307
- college_medicine	1	none	0	acc	↑	0.3815	±	0.0370
- global_facts	1	none	0	acc	↑	0.3200	±	0.0469
- human_aging	1	none	0	acc	↑	0.5157	±	0.0335
- management	1	none	0	acc	↑	0.5243	±	0.0494
- marketing	1	none	0	acc	↑	0.6709	±	0.0308
- medical_genetics	1	none	0	acc	↑	0.4800	±	0.0502
- miscellaneous	1	none	0	acc	↑	0.6015	±	0.0175
- nutrition	1	none	0	acc	↑	0.5686	±	0.0284
- professional_accounting	1	none	0	acc	↑	0.3511	±	0.0285
- professional_medicine	1	none	0	acc	↑	0.5625	±	0.0301
- virology	1	none	0	acc	↑	0.4157	±	0.0384
- social sciences	2	none		acc	↑	0.5057	±	0.0088
- econometrics	1	none	0	acc	↑	0.2456	±	0.0405
- high_school_geography	1	none	0	acc	↑	0.5606	±	0.0354
- high_school_government_and_politics	1	none	0	acc	↑	0.5389	±	0.0360
- high_school_macroeconomics	1	none	0	acc	↑	0.4128	±	0.0250
- high_school_microeconomics	1	none	0	acc	↑	0.4454	±	0.0323
- high_school_psychology	1	none	0	acc	↑	0.6183	±	0.0208
- human_sexuality	1	none	0	acc	↑	0.5420	±	0.0437
- professional_psychology	1	none	0	acc	↑	0.4167	±	0.0199
- public_relations	1	none	0	acc	↑	0.5000	±	0.0479
- security_studies	1	none	0	acc	↑	0.5265	±	0.0320
- sociology	1	none	0	acc	↑	0.6468	±	0.0338
- us_foreign_policy	1	none	0	acc	↑	0.6900	±	0.0465
- stem	2	none		acc	↑	0.3834	±	0.0085
- abstract_algebra	1	none	0	acc	↑	0.2500	±	0.0435
- anatomy	1	none	0	acc	↑	0.4889	±	0.0432
- astronomy	1	none	0	acc	↑	0.5329	±	0.0406
- college_biology	1	none	0	acc	↑	0.4931	±	0.0418
- college_chemistry	1	none	0	acc	↑	0.3800	±	0.0488
- college_computer_science	1	none	0	acc	↑	0.3300	±	0.0473
- college_mathematics	1	none	0	acc	↑	0.2800	±	0.0451
- college_physics	1	none	0	acc	↑	0.2451	±	0.0428
- computer_security	1	none	0	acc	↑	0.4800	±	0.0502
- conceptual_physics	1	none	0	acc	↑	0.4383	±	0.0324
- electrical_engineering	1	none	0	acc	↑	0.5310	±	0.0416
- elementary_mathematics	1	none	0	acc	↑	0.2884	±	0.0233
- high_school_biology	1	none	0	acc	↑	0.4935	±	0.0284
- high_school_chemistry	1	none	0	acc	↑	0.3645	±	0.0339
- high_school_computer_science	1	none	0	acc	↑	0.4500	±	0.0500
- high_school_mathematics	1	none	0	acc	↑	0.2815	±	0.0274
- high_school_physics	1	none	0	acc	↑	0.3113	±	0.0378
- high_school_statistics	1	none	0	acc	↑	0.3657	±	0.0328
- machine_learning	1	none	0	acc	↑	0.2768	±	0.0425

Top

Environmental Impact

Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).

Hardware Type: MacBook Air M1
Hours used: 1
Cloud Provider: GPC, A100
Compute Region: US-EAST1
Carbon Emitted: 0.09 kgCO₂ of which 100 percents were directly offset by the cloud provider.

Top