Update README.md
Browse files
README.md
CHANGED
@@ -1,7 +1,19 @@
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
1 |
# UNAversal - Uniform Neural Alignment (MoE)
|
2 |
|
3 |
This is just a beta, a first release so people can start working on franksteins and so.
|
4 |
It does achieve high GSM/Math and TQA, so ideally you can merge it with other mixtrals and see what coming out of it
|
|
|
5 |
|
6 |
## UNA Details
|
7 |
For this model we came out with the most obvious, placing UNA on the router_logit. It does work, but we saw a much better performance on SFT by doing so.
|
@@ -58,4 +70,4 @@ Here there are some, but we also submitted it to the HF eval queue....
|
|
58 |
|pubmedqa |Yaml |none | 0|acc |0.7920|± |0.0182|
|
59 |
|sciq |Yaml |none | 0|acc |0.9630|± |0.0060|
|
60 |
| | |none | 0|acc_norm |0.9370|± |0.0077|
|
61 |
-
```
|
|
|
1 |
+
---
|
2 |
+
license: cc-by-nc-sa-4.0
|
3 |
+
language:
|
4 |
+
- en
|
5 |
+
library_name: transformers
|
6 |
+
tags:
|
7 |
+
- UNA
|
8 |
+
- juanako
|
9 |
+
- mixtral
|
10 |
+
- MoE
|
11 |
+
---
|
12 |
# UNAversal - Uniform Neural Alignment (MoE)
|
13 |
|
14 |
This is just a beta, a first release so people can start working on franksteins and so.
|
15 |
It does achieve high GSM/Math and TQA, so ideally you can merge it with other mixtrals and see what coming out of it
|
16 |
+
Based on [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
|
17 |
|
18 |
## UNA Details
|
19 |
For this model we came out with the most obvious, placing UNA on the router_logit. It does work, but we saw a much better performance on SFT by doing so.
|
|
|
70 |
|pubmedqa |Yaml |none | 0|acc |0.7920|± |0.0182|
|
71 |
|sciq |Yaml |none | 0|acc |0.9630|± |0.0060|
|
72 |
| | |none | 0|acc_norm |0.9370|± |0.0077|
|
73 |
+
```
|