sagorsarker
commited on
Commit
•
ded054a
1
Parent(s):
fa529d8
Update README.md
Browse files
README.md
CHANGED
@@ -118,6 +118,7 @@ We evaluated the models on the following datasets:
|
|
118 |
#### Evaluation of English Benchmark datasets
|
119 |
- **llama-3.2-1b** consistently leads across all tasks in both 0-shot and 5-shot settings, with top scores of **0.75** in **PIQA** and **0.64** in **BoolQ**.
|
120 |
- **hishab/titulm-llama-3.2-1b-v1.0** shows competitive performance but generally scores lower than **llama-3.2-1b**, particularly in the 5-shot setting.
|
|
|
121 |
|
122 |
| Model | Shots | MMLU | BoolQ | Commonsense QA | OpenBook QA | PIQA |
|
123 |
|--------------------------------------|--------|--------------|------------|--------------------|-----------------|-----------|
|
|
|
118 |
#### Evaluation of English Benchmark datasets
|
119 |
- **llama-3.2-1b** consistently leads across all tasks in both 0-shot and 5-shot settings, with top scores of **0.75** in **PIQA** and **0.64** in **BoolQ**.
|
120 |
- **hishab/titulm-llama-3.2-1b-v1.0** shows competitive performance but generally scores lower than **llama-3.2-1b**, particularly in the 5-shot setting.
|
121 |
+
- It is expected as we have trained the model only on Bangla text.
|
122 |
|
123 |
| Model | Shots | MMLU | BoolQ | Commonsense QA | OpenBook QA | PIQA |
|
124 |
|--------------------------------------|--------|--------------|------------|--------------------|-----------------|-----------|
|