--- license: llama3.2 tags: - llama-3 - orpo - transformers datasets: - mlabonne/orpo-dpo-mix-40k language: - en base_model: - meta-llama/Llama-3.2-1B-Instruct library_name: transformers pipeline_tag: text-generation model-index: - name: week2-llama3-1B results: - task: type: text-generation dataset: name: mlabonne/orpo-dpo-mix-40k type: mlabonne/orpo-dpo-mix-40k metrics: - name: acc-norm (0-Shot) type: acc-norm (0-Shot) value: 0.6077 metrics: - accuracy --- # Llama-3.2-1B-Instruct-ORPO [Evaluation](#evaluation) [Environmental Inpact](#environmental-impact) ## Model Details This model was obtained by finetuning the open source [Llama-3.2-1B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-1B-Instruct) model on the [mlabonne/orpo-dpo-mix-40k](mlabonne/orpo-dpo-mix-40k) dataset, leveraging [Odds Ratio Preference Optimization (ORPO)](https://github.com/xfactlab/orpo) for Reinforcement Learning. ## Uses This model is optimized for general-purpose language tasks. ## Evaluation We used the [Eulether](https://github.com/EleutherAI/lm-evaluation-harness) test harness to evaluate the finetuned model. The table below presents a summary of the evaluation performed. For a more granular evaluation on `MMLU`, please see Section [MMLU](#mmlu). | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |---------|------:|------|-----:|--------|---|-----:|---|-----:| |hellaswag| 1|none | 0|acc |↑ |0.4507|± |0.0050| | | |none | 0|acc_norm|↑ |0.6077|± |0.0049| |arc_easy| 1|none | 0|acc |↑ |0.6856|± |0.0095| | | |none | 0|acc_norm|↑ |0.6368|± |0.0099| |mmlu | 2|none | |acc |↑ |0.4597|± |0.0041| | - humanities | 2|none | |acc |↑ |0.4434|± |0.0071| | - other | 2|none | |acc |↑ |0.5163|± |0.0088| | - social sciences| 2|none | |acc |↑ |0.5057|± |0.0088| | - stem | 2|none | |acc |↑ |0.3834|± |0.0085| [Top](#top) ### MMLU | Tasks |Version|Filter|n-shot| Metric | |Value | |Stderr| |---------|------:|------|-----:|--------|---|-----:|---|-----:| |mmlu | 2|none | |acc |↑ |0.4597|± |0.0041| | - humanities | 2|none | |acc |↑ |0.4434|± |0.0071| | - formal_logic | 1|none | 0|acc |↑ |0.3254|± |0.0419| | - high_school_european_history | 1|none | 0|acc |↑ |0.6182|± |0.0379| | - high_school_us_history | 1|none | 0|acc |↑ |0.5784|± |0.0347| | - high_school_world_history | 1|none | 0|acc |↑ |0.6540|± |0.0310| | - international_law | 1|none | 0|acc |↑ |0.6033|± |0.0447| | - jurisprudence | 1|none | 0|acc |↑ |0.5370|± |0.0482| | - logical_fallacies | 1|none | 0|acc |↑ |0.4479|± |0.0391| | - moral_disputes | 1|none | 0|acc |↑ |0.4711|± |0.0269| | - moral_scenarios | 1|none | 0|acc |↑ |0.3408|± |0.0159| | - philosophy | 1|none | 0|acc |↑ |0.5177|± |0.0284| | - prehistory | 1|none | 0|acc |↑ |0.5278|± |0.0278| | - professional_law | 1|none | 0|acc |↑ |0.3683|± |0.0123| | - world_religions | 1|none | 0|acc |↑ |0.5906|± |0.0377| | - other | 2|none | |acc |↑ |0.5163|± |0.0088| | - business_ethics | 1|none | 0|acc |↑ |0.4300|± |0.0498| | - clinical_knowledge | 1|none | 0|acc |↑ |0.4642|± |0.0307| | - college_medicine | 1|none | 0|acc |↑ |0.3815|± |0.0370| | - global_facts | 1|none | 0|acc |↑ |0.3200|± |0.0469| | - human_aging | 1|none | 0|acc |↑ |0.5157|± |0.0335| | - management | 1|none | 0|acc |↑ |0.5243|± |0.0494| | - marketing | 1|none | 0|acc |↑ |0.6709|± |0.0308| | - medical_genetics | 1|none | 0|acc |↑ |0.4800|± |0.0502| | - miscellaneous | 1|none | 0|acc |↑ |0.6015|± |0.0175| | - nutrition | 1|none | 0|acc |↑ |0.5686|± |0.0284| | - professional_accounting | 1|none | 0|acc |↑ |0.3511|± |0.0285| | - professional_medicine | 1|none | 0|acc |↑ |0.5625|± |0.0301| | - virology | 1|none | 0|acc |↑ |0.4157|± |0.0384| | - social sciences | 2|none | |acc |↑ |0.5057|± |0.0088| | - econometrics | 1|none | 0|acc |↑ |0.2456|± |0.0405| | - high_school_geography | 1|none | 0|acc |↑ |0.5606|± |0.0354| | - high_school_government_and_politics| 1|none | 0|acc |↑ |0.5389|± |0.0360| | - high_school_macroeconomics | 1|none | 0|acc |↑ |0.4128|± |0.0250| | - high_school_microeconomics | 1|none | 0|acc |↑ |0.4454|± |0.0323| | - high_school_psychology | 1|none | 0|acc |↑ |0.6183|± |0.0208| | - human_sexuality | 1|none | 0|acc |↑ |0.5420|± |0.0437| | - professional_psychology | 1|none | 0|acc |↑ |0.4167|± |0.0199| | - public_relations | 1|none | 0|acc |↑ |0.5000|± |0.0479| | - security_studies | 1|none | 0|acc |↑ |0.5265|± |0.0320| | - sociology | 1|none | 0|acc |↑ |0.6468|± |0.0338| | - us_foreign_policy | 1|none | 0|acc |↑ |**0.6900**|± |0.0465| | - stem | 2|none | |acc |↑ |0.3834|± |0.0085| | - abstract_algebra | 1|none | 0|acc |↑ |0.2500|± |0.0435| | - anatomy | 1|none | 0|acc |↑ |0.4889|± |0.0432| | - astronomy | 1|none | 0|acc |↑ |0.5329|± |0.0406| | - college_biology | 1|none | 0|acc |↑ |0.4931|± |0.0418| | - college_chemistry | 1|none | 0|acc |↑ |0.3800|± |0.0488| | - college_computer_science | 1|none | 0|acc |↑ |0.3300|± |0.0473| | - college_mathematics | 1|none | 0|acc |↑ |0.2800|± |0.0451| | - college_physics | 1|none | 0|acc |↑ |0.2451|± |0.0428| | - computer_security | 1|none | 0|acc |↑ |0.4800|± |0.0502| | - conceptual_physics | 1|none | 0|acc |↑ |0.4383|± |0.0324| | - electrical_engineering | 1|none | 0|acc |↑ |0.5310|± |0.0416| | - elementary_mathematics | 1|none | 0|acc |↑ |0.2884|± |0.0233| | - high_school_biology | 1|none | 0|acc |↑ |0.4935|± |0.0284| | - high_school_chemistry | 1|none | 0|acc |↑ |0.3645|± |0.0339| | - high_school_computer_science | 1|none | 0|acc |↑ |0.4500|± |0.0500| | - high_school_mathematics | 1|none | 0|acc |↑ |0.2815|± |0.0274| | - high_school_physics | 1|none | 0|acc |↑ |0.3113|± |0.0378| | - high_school_statistics | 1|none | 0|acc |↑ |0.3657|± |0.0328| | - machine_learning | 1|none | 0|acc |↑ |0.2768|± |0.0425| [Top](#top) ## Environmental Impact Carbon emissions can be estimated using the [Machine Learning Impact calculator](https://mlco2.github.io/impact#compute) presented in [Lacoste et al. (2019)](https://arxiv.org/abs/1910.09700). - **Hardware Type:** MacBook Air M1 - **Hours used:** 1 - **Cloud Provider:** GPC, A100 - **Compute Region:** US-EAST1 - **Carbon Emitted:** 0.09 kgCO2 of which 100 percents were directly offset by the cloud provider. [Top](#top)