Update README.md
Browse files
README.md
CHANGED
@@ -46,6 +46,8 @@ The Tamil LLaMA models have been enhanced and tailored specifically with an exte
|
|
46 |
|
47 |
Benchmarking was done using [LLM-Autoeval](https://github.com/mlabonne/llm-autoeval) on an RTX 3090 on [runpod](https://www.runpod.io/).
|
48 |
|
|
|
|
|
49 |
| Benchmark | Llama 2 Chat | Tamil Llama v0.2 Instruct | Telugu Llama Instruct | Malayalam Llama Instruct |
|
50 |
|---------------|--------------|---------------------------|-----------------------|--------------------------|
|
51 |
| ARC Challenge (25-shot) | 52.9 | **53.75** | 52.47 | 52.82 |
|
|
|
46 |
|
47 |
Benchmarking was done using [LLM-Autoeval](https://github.com/mlabonne/llm-autoeval) on an RTX 3090 on [runpod](https://www.runpod.io/).
|
48 |
|
49 |
+
> **Note:** Please note that discrepancies have been observed between the Open LLM Leaderboard scores and those obtained from local runs using the LM Eval Harness with identical configurations. The results mentioned here are based on our own benchmarking. To replicate these findings, you can utilize the LLM-Autoeval or use [lm-evaluation-harness](https://github.com/EleutherAI/lm-evaluation-harness) locally with the configurations described in Open LLM Leaderboard's About page.
|
50 |
+
|
51 |
| Benchmark | Llama 2 Chat | Tamil Llama v0.2 Instruct | Telugu Llama Instruct | Malayalam Llama Instruct |
|
52 |
|---------------|--------------|---------------------------|-----------------------|--------------------------|
|
53 |
| ARC Challenge (25-shot) | 52.9 | **53.75** | 52.47 | 52.82 |
|