appvoid commited on
Commit
e52a9d8
1 Parent(s): 5666f8b

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +10 -11
README.md CHANGED
@@ -4,18 +4,17 @@ language:
4
  - en
5
  ---
6
  # palmer
7
- palmer-003 focuses on reaching sota performance by MErging of Experts + fine-tuning, where each expert is consolidated into one model and finally is fine-tuned on useful textual data. These techniques seem to be quite powerful yet simpler to apply. It follows an approach that mimicks mistral bias to act as an assistant without using prompts.
8
 
9
- ```
10
- ### Evaluation
11
- ARC-C OBQA HellaSwag PIQA Winogrande Average
12
- tinyllama | 0.3029 | 0.3600 | 0.5935 | 0.7329 | 0.5959 | 0.5170 |
13
- palmer-002 | 0.3242 | 0.3700 | 0.5956 | 0.7345 | 0.5888 | 0.5226 |
14
- tinyllama-chat | 0.3285 | 0.3740 | 0.6037 | 0.7448 | 0.6022 | 0.5306 |
15
- zyte-1b | 0.3353 | 0.3700 | 0.6086 | 0.7541 | 0.5998 | 0.5335 |
16
- babbage-002 | 0.3285 | 0.3620 | 0.6380 | 0.7606 | 0.6085 | 0.5395 |
17
- palmer-003 | 0.3370 | 0.3740 | 0.6128 | 0.7486 | 0.6535 | 0.5451 |
18
- ```
19
 
20
  **Prompt test**
21
 
 
4
  - en
5
  ---
6
  # palmer
7
+ This model is a "MErging of Experts" (MEoE) fine-tuned to be biased as an assistant without using any prompts—as a result of these efforts—palmer is better than babbage-002 at most tasks and competitive to qwen-1.8b on most benchmarks despite being 40% smaller.
8
 
9
+ MMLU ARC-C OBQA HellaSwag PIQA Winogrande Average
10
+ tinyllama-chat | 0.2470 | 0.3285 | 0.3740 | 0.6037 | 0.7448 | 0.6022 | 0.4833 |
11
+ zyte-1b | 0.2397 | 0.3353 | 0.3700 | 0.6086 | 0.7541 | 0.5998 | 0.4845 |
12
+ palmer-003 | 0.2534 | 0.3370 | 0.3740 | 0.6128 | 0.7486 | 0.6535 | 0.4965 |
13
+ qwen-1-8 | 0.4536 | 0.3490 | 0.3320 | 0.5876 | 0.7307 | 0.5896 | 0.5070 |
14
+
15
+ This work constitutes an advancement towards SMLs being easily powered by edge devices like mobile phones, raspberry pis and automated software/robots. Aditionally, palmer-003 deviates its main philosophy from palmer-family to become a more powerful model with more data instead of less.
16
+
17
+ Note that since this is a traditional transformer as any popular language model, it contains hallucinations (make mistakes), and as such, it must be used with precaution on sensitive tasks.
 
18
 
19
  **Prompt test**
20