{ "cells": [ { "cell_type": "markdown", "metadata": { "id": "Ac6wadk3rmkK" }, "source": [ "# LM Evaluation Harness (by [EleutherAI](https://www.eleuther.ai/) & [Laiviet](https://github.com/laiviet/lm-evaluation-harness))\n", "\n", "This [`LM-Evaluation-Harness`](https://github.com/EleutherAI/lm-evaluation-harness) provides a unified framework to test generative language models on a large number of different evaluation tasks. For a complete list of available tasks, see the [task table](https://github.com/EleutherAI/lm-evaluation-harness/blob/master/docs/task_table.md), or scroll to the bottom of the page.\n", "\n", "1. Clone the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and install the necessary libraries (`sentencepiece` is required for the Llama tokenizer)." ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "UA5I86u91e0A" }, "outputs": [], "source": [ "!git clone --branch main https://github.com/laiviet/lm-evaluation-harness.git\n", "!cd lm-evaluation-harness && pip install -e . -q\n", "!pip install cohere tiktoken sentencepiece -q" ] }, { "cell_type": "code", "execution_count": null, "metadata": { "id": "pnHoAVK25QZn" }, "outputs": [], "source": [ "!huggingface-cli login --token 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\n", "!cd lm-evaluation-harness && python main.py \\\n", " --model hf-auto \\\n", " --model_args pretrained=nicholasKluge/Aira-2-portuguese-124M \\\n", " --tasks arc_pt,truthfulqa_pt \\\n", " --device cuda:0 \\\n", " --model_alias Aira-2-portuguese-124M \\\n", " --task_alias open_llm" ] }, { "cell_type": "markdown", "metadata": { "id": "4Bm78wiZ4Own" }, "source": [ "## Task Table 📚\n", "\n", "| Task Name | Train | Val | Test | Val/Test Docs | Metrics |\n", "|----------------|-------|-----|------|--:------------|---------------|\n", "| arc_pt,mmlu_pt | ✓ | ✓ | ✓ | 1172 | acc, acc_norm |\n", "| hellaswag_pt | ✓ | ✓ | | 10042 | acc, acc_norm |\n", "| mmlu_pt | | ✓ | ✓ | 1,662 | acc, acc_norm |\n", "| truthfulqa_pt | | ✓ | | 817 | mc1, mc2 | " ] } ], "metadata": { "accelerator": "GPU", "colab": { "provenance": [], "machine_shape": "hm", "gpuType": "T4" }, "kernelspec": { "display_name": "Python 3", "name": "python3" }, "language_info": { "name": "python" } }, "nbformat": 4, "nbformat_minor": 0 }