nicholasKluge commited on
Commit
239b81d
1 Parent(s): 922df51

Upload lm_evaluation_harness_pt.ipynb

Browse files
Files changed (1) hide show
  1. lm_evaluation_harness_pt.ipynb +81 -0
lm_evaluation_harness_pt.ipynb ADDED
@@ -0,0 +1,81 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "cells": [
3
+ {
4
+ "cell_type": "markdown",
5
+ "metadata": {
6
+ "id": "Ac6wadk3rmkK"
7
+ },
8
+ "source": [
9
+ "# LM Evaluation Harness (by [EleutherAI](https://www.eleuther.ai/) & [Laiviet](https://github.com/laiviet/lm-evaluation-harness))\n",
10
+ "\n",
11
+ "This [`LM-Evaluation-Harness`](https://github.com/EleutherAI/lm-evaluation-harness) provides a unified framework to test generative language models on a large number of different evaluation tasks. For a complete list of available tasks, see the [task table](https://github.com/EleutherAI/lm-evaluation-harness/blob/master/docs/task_table.md), or scroll to the bottom of the page.\n",
12
+ "\n",
13
+ "1. Clone the [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) and install the necessary libraries (`sentencepiece` is required for the Llama tokenizer)."
14
+ ]
15
+ },
16
+ {
17
+ "cell_type": "code",
18
+ "execution_count": null,
19
+ "metadata": {
20
+ "id": "UA5I86u91e0A"
21
+ },
22
+ "outputs": [],
23
+ "source": [
24
+ "!git clone --branch main https://github.com/laiviet/lm-evaluation-harness.git\n",
25
+ "!cd lm-evaluation-harness && pip install -e . -q\n",
26
+ "!pip install cohere tiktoken sentencepiece -q"
27
+ ]
28
+ },
29
+ {
30
+ "cell_type": "code",
31
+ "execution_count": null,
32
+ "metadata": {
33
+ "id": "pnHoAVK25QZn"
34
+ },
35
+ "outputs": [],
36
+ "source": [
37
+ "!huggingface-cli login --token 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\n",
38
+ "!cd lm-evaluation-harness && python main.py \\\n",
39
+ " --model hf-auto \\\n",
40
+ " --model_args pretrained=nicholasKluge/Aira-2-portuguese-124M \\\n",
41
+ " --tasks arc_pt,truthfulqa_pt \\\n",
42
+ " --device cuda:0 \\\n",
43
+ " --model_alias Aira-2-portuguese-124M \\\n",
44
+ " --task_alias open_llm"
45
+ ]
46
+ },
47
+ {
48
+ "cell_type": "markdown",
49
+ "metadata": {
50
+ "id": "4Bm78wiZ4Own"
51
+ },
52
+ "source": [
53
+ "## Task Table 📚\n",
54
+ "\n",
55
+ "| Task Name | Train | Val | Test | Val/Test Docs | Metrics |\n",
56
+ "|----------------|-------|-----|------|--:------------|---------------|\n",
57
+ "| arc_pt,mmlu_pt | ✓ | ✓ | ✓ | 1172 | acc, acc_norm |\n",
58
+ "| hellaswag_pt | ✓ | ✓ | | 10042 | acc, acc_norm |\n",
59
+ "| mmlu_pt | | ✓ | ✓ | 1,662 | acc, acc_norm |\n",
60
+ "| truthfulqa_pt | | ✓ | | 817 | mc1, mc2 | "
61
+ ]
62
+ }
63
+ ],
64
+ "metadata": {
65
+ "accelerator": "GPU",
66
+ "colab": {
67
+ "provenance": [],
68
+ "machine_shape": "hm",
69
+ "gpuType": "T4"
70
+ },
71
+ "kernelspec": {
72
+ "display_name": "Python 3",
73
+ "name": "python3"
74
+ },
75
+ "language_info": {
76
+ "name": "python"
77
+ }
78
+ },
79
+ "nbformat": 4,
80
+ "nbformat_minor": 0
81
+ }