TildeAI
/

TildeOpen-30b

Text Generation

text-generation-inference

Model card Files Files and versions

TildeSIA commited on Sep 16

Commit

5eb4e00

·

verified ·

1 Parent(s): d91ef50

add ARC description to README

Files changed (1) hide show

README.md +7 -1

README.md CHANGED Viewed

@@ -179,8 +179,14 @@ We used the standard implementation of the [MultiBLiMP](https://github.com/Eleut
 ## Knowledge tests
 ### ARC Benchmark Results
-| 5-shot |  | **ARC Easy**| |  | **ARC Hard**| |
 |----------|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
 | **Language** | **ALIA 40b** | **EuroLLM Prev. 22b** | **TildeOpen 1.1 30b** | **ALIA 40b** | **EuroLLM Prev. 22b** | **TildeOpen 1.1 30b** |
 | Danish | 79.9% | **80.1%** | 79.6% | 53.4% | 52.6% | **53.7%** |

 ## Knowledge tests
 ### ARC Benchmark Results
+**What is ARC?** [ARC](https://arxiv.org/pdf/1803.05457) - The AI2 Reasoning Challenge is a multiple-choice science question benchmark **in English**, derived from U.S. grade-school standardized exams. It has two subsets — ARC Easy and ARC Challenge — designed to test factual knowledge and common-sense.
+**Why does this Matter?** ARC probes a model’s ability to answer non-trivial questions by applying world knowledge. In the classic lm-evaluation-harness ARC implementation the answer choices for each question are **not** provided during inference, thus placing emphasis on world knowledge, rather than on the model's reasoning capabilities.
+**What did we do?**
+We use multilingual translations of ARC provided by [Eurolingua](https://huggingface.co/datasets/Eurolingua/arcx); please refer to the [publication](https://arxiv.org/pdf/2410.08928). Other than the data source, we replicate the standard [LM Evaluation Harness configuration for ARC](https://github.com/EleutherAI/lm-evaluation-harness/tree/main/lm_eval/tasks/arc). Our exact configuration is available at [TBA]. We set tokenisers to ```use_fast=False```. We report **5-shot** accuracy.
+| 5-shot |  | **ARC Easy**| |  | **ARC Challenge**| |
 |----------|:----------:|:----------:|:----------:|:----------:|:----------:|:----------:|
 | **Language** | **ALIA 40b** | **EuroLLM Prev. 22b** | **TildeOpen 1.1 30b** | **ALIA 40b** | **EuroLLM Prev. 22b** | **TildeOpen 1.1 30b** |
 | Danish | 79.9% | **80.1%** | 79.6% | 53.4% | 52.6% | **53.7%** |