appvoid
/

palmer-002.5

Text Generation

text-generation-inference

Inference Endpoints

Model card Files Files and versions Community

appvoid commited on Jan 22

Commit

e52a9d8

•

1 Parent(s): 5666f8b

Update README.md

Files changed (1) hide show

README.md +10 -11

README.md CHANGED Viewed

@@ -4,18 +4,17 @@ language:
 - en
 ---
 # palmer
-palmer-003 focuses on reaching sota performance by MErging of Experts + fine-tuning, where each expert is consolidated into one model and finally is fine-tuned on useful textual data. These techniques seem to be quite powerful yet simpler to apply. It follows an approach that mimicks mistral bias to act as an assistant without using prompts.
-```
-### Evaluation
-	               ARC-C     OBQA   HellaSwag  PIQA  Winogrande Average
-tinyllama        | 0.3029 | 0.3600 | 0.5935 | 0.7329 | 0.5959 | 0.5170 |
-palmer-002       | 0.3242 | 0.3700 | 0.5956 | 0.7345 | 0.5888 | 0.5226 |
-tinyllama-chat   | 0.3285 | 0.3740 | 0.6037 | 0.7448 | 0.6022 | 0.5306 |
-zyte-1b		     | 0.3353 | 0.3700 | 0.6086 | 0.7541 | 0.5998 | 0.5335 |
-babbage-002      | 0.3285 | 0.3620 | 0.6380 | 0.7606 | 0.6085 | 0.5395 |
-palmer-003       | 0.3370 | 0.3740 | 0.6128 | 0.7486 | 0.6535 | 0.5451 |
-```
 **Prompt test**

 - en
 ---
 # palmer
+This model is a "MErging of Experts" (MEoE) fine-tuned to be biased as an assistant without using any prompts—as a result of these efforts—palmer is better than babbage-002 at most tasks and competitive to qwen-1.8b on most benchmarks despite being 40% smaller.
+	         MMLU     ARC-C    OBQA   HellaSwag  PIQA  Winogrande Average
+tinyllama-chat | 0.2470 | 0.3285 | 0.3740 | 0.6037 | 0.7448 | 0.6022 | 0.4833 |
+zyte-1b	       | 0.2397 | 0.3353 | 0.3700 | 0.6086 | 0.7541 | 0.5998 | 0.4845 |
+palmer-003     | 0.2534 | 0.3370 | 0.3740 | 0.6128 | 0.7486 | 0.6535 | 0.4965 |
+qwen-1-8       | 0.4536 | 0.3490 | 0.3320 | 0.5876 | 0.7307 | 0.5896 | 0.5070 |
+This work constitutes an advancement towards SMLs being easily powered by edge devices like mobile phones, raspberry pis and automated software/robots. Aditionally, palmer-003 deviates its main philosophy from palmer-family to become a more powerful model with more data instead of less.
+Note that since this is a traditional transformer as any popular language model, it contains hallucinations (make mistakes), and as such, it must be used with precaution on sensitive tasks.
 **Prompt test**