{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Ac6wadk3rmkK"
      },
      "source": [
        "# LM Evaluation Harness (by [EleutherAI](https://www.eleuther.ai/) & [Laiviet](https://github.com/laiviet/lm-evaluation-harness))\n",
        "\n",
        "This [`LM-Evaluation-Harness`](https://github.com/EleutherAI/lm-evaluation-harness) provides a unified framework to test generative language models on a large number of different evaluation tasks. For a complete list of available tasks, see the [task table](https://github.com/EleutherAI/lm-evaluation-harness/blob/master/docs/task_table.md), or scroll to the bottom of the page.\n",
        "\n",
        "1. Clone the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and install the necessary libraries (`sentencepiece` is required for the Llama tokenizer)."
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "UA5I86u91e0A"
      },
      "outputs": [],
      "source": [
        "!git clone --branch main https://github.com/laiviet/lm-evaluation-harness.git\n",
        "!cd lm-evaluation-harness && pip install -e . -q\n",
        "!pip install cohere tiktoken sentencepiece -q"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {
        "id": "pnHoAVK25QZn"
      },
      "outputs": [],
      "source": [
        "!huggingface-cli login --token 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\n",
        "!cd lm-evaluation-harness && python main.py \\\n",
        "    --model hf-auto \\\n",
        "    --model_args pretrained=nicholasKluge/Aira-2-portuguese-124M \\\n",
        "    --tasks arc_pt,truthfulqa_pt  \\\n",
        "    --device cuda:0 \\\n",
        "    --model_alias Aira-2-portuguese-124M \\\n",
        "    --task_alias open_llm"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "4Bm78wiZ4Own"
      },
      "source": [
        "## Task Table 📚\n",
        "\n",
        "| Task Name      | Train | Val | Test | Val/Test Docs | Metrics       |\n",
        "|----------------|-------|-----|------|--:------------|---------------|\n",
        "| arc_pt,mmlu_pt | ✓     | ✓   | ✓    | 1172          | acc, acc_norm |\n",
        "| hellaswag_pt   | ✓     | ✓   |      | 10042         | acc, acc_norm |\n",
        "| mmlu_pt        |       | ✓   | ✓    | 1,662         | acc, acc_norm |\n",
        "| truthfulqa_pt  |       | ✓   |      | 817           | mc1, mc2      |  "
      ]
    }
  ],
  "metadata": {
    "accelerator": "GPU",
    "colab": {
      "provenance": [],
      "machine_shape": "hm",
      "gpuType": "T4"
    },
    "kernelspec": {
      "display_name": "Python 3",
      "name": "python3"
    },
    "language_info": {
      "name": "python"
    }
  },
  "nbformat": 4,
  "nbformat_minor": 0
}