Updated eval numbers in README.md
Browse files
README.md
CHANGED
@@ -37,7 +37,6 @@ Krutrim Large Language Model (LLM) is a 2 trillion token multilingual foundation
|
|
37 |
|
38 |
| Model Name | Release Date |Release Note | Reference|
|
39 |
|------------|-------------|-------------|-------------|
|
40 |
-
| Krutrim-1-Base | 2024-01-31 | Trained from scratch | [Here](https://huggingface.co/krutrim-ai-labs/Krutrim-1-base)
|
41 |
| Krutrim-1-Instruct | 2024-01-31 | SFT on Krutrim-1-Base |[Here](https://huggingface.co/krutrim-ai-labs/Krutrim-1-instruct)
|
42 |
|
43 |
|
@@ -56,23 +55,23 @@ Krutrim Large Language Model (LLM) is a 2 trillion token multilingual foundation
|
|
56 |
|
57 |
## Evaluation Results
|
58 |
|
59 |
-
### English Comparison between Krutrim-1 and Llama2Chat (Benchmarks run on `llm_foundry`)
|
60 |
|
61 |
| Task | Llama2Chat | Krutrim-1-7B |
|
62 |
|--------------------|--------------|------------|
|
63 |
| arc | 0.517 | **0.557** |
|
64 |
| bigbench | **0.359** | 0.330 |
|
65 |
-
| boolq |
|
66 |
| copa | 0.78 | **0.82** |
|
67 |
| hellaswag | **0.754** | 0.740 |
|
68 |
-
| jeopardy | 0.306 |
|
69 |
| lambadaopenai | **0.695** | 0.682 |
|
70 |
-
| logiqa | 0.332 |
|
71 |
-
| mathqa |
|
72 |
| mmlu | 0.472 | **0.495** |
|
73 |
| openbookqa | 0.44 | **0.464** |
|
74 |
-
| piqa |
|
75 |
-
| simplearithmetic | 0.160 |
|
76 |
| squad | 0.3565 | **0.369** |
|
77 |
| winograd | **0.8645** | 0.828 |
|
78 |
| winogrande | 0.681 | **0.697** |
|
|
|
37 |
|
38 |
| Model Name | Release Date |Release Note | Reference|
|
39 |
|------------|-------------|-------------|-------------|
|
|
|
40 |
| Krutrim-1-Instruct | 2024-01-31 | SFT on Krutrim-1-Base |[Here](https://huggingface.co/krutrim-ai-labs/Krutrim-1-instruct)
|
41 |
|
42 |
|
|
|
55 |
|
56 |
## Evaluation Results
|
57 |
|
58 |
+
### English Comparison between Krutrim-1-7B and Llama2Chat-7B (Benchmarks run on `llm_foundry`)
|
59 |
|
60 |
| Task | Llama2Chat | Krutrim-1-7B |
|
61 |
|--------------------|--------------|------------|
|
62 |
| arc | 0.517 | **0.557** |
|
63 |
| bigbench | **0.359** | 0.330 |
|
64 |
+
| boolq | 0.803 | **0.843** |
|
65 |
| copa | 0.78 | **0.82** |
|
66 |
| hellaswag | **0.754** | 0.740 |
|
67 |
+
| jeopardy | **0.306** | 0.286 |
|
68 |
| lambadaopenai | **0.695** | 0.682 |
|
69 |
+
| logiqa | **0.332** | 0.3195 |
|
70 |
+
| mathqa | 0.436 | **0.440** |
|
71 |
| mmlu | 0.472 | **0.495** |
|
72 |
| openbookqa | 0.44 | **0.464** |
|
73 |
+
| piqa | 0.7601 | **0.7726** |
|
74 |
+
| simplearithmetic | **0.160** | 0.077 |
|
75 |
| squad | 0.3565 | **0.369** |
|
76 |
| winograd | **0.8645** | 0.828 |
|
77 |
| winogrande | 0.681 | **0.697** |
|