fblgit commited on
Commit
71938a3
1 Parent(s): 9225a9c

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +13 -1
README.md CHANGED
@@ -1,7 +1,19 @@
 
 
 
 
 
 
 
 
 
 
 
1
  # UNAversal - Uniform Neural Alignment (MoE)
2
 
3
  This is just a beta, a first release so people can start working on franksteins and so.
4
  It does achieve high GSM/Math and TQA, so ideally you can merge it with other mixtrals and see what coming out of it
 
5
 
6
  ## UNA Details
7
  For this model we came out with the most obvious, placing UNA on the router_logit. It does work, but we saw a much better performance on SFT by doing so.
@@ -58,4 +70,4 @@ Here there are some, but we also submitted it to the HF eval queue....
58
  |pubmedqa |Yaml |none | 0|acc |0.7920|± |0.0182|
59
  |sciq |Yaml |none | 0|acc |0.9630|± |0.0060|
60
  | | |none | 0|acc_norm |0.9370|± |0.0077|
61
- ```
 
1
+ ---
2
+ license: cc-by-nc-sa-4.0
3
+ language:
4
+ - en
5
+ library_name: transformers
6
+ tags:
7
+ - UNA
8
+ - juanako
9
+ - mixtral
10
+ - MoE
11
+ ---
12
  # UNAversal - Uniform Neural Alignment (MoE)
13
 
14
  This is just a beta, a first release so people can start working on franksteins and so.
15
  It does achieve high GSM/Math and TQA, so ideally you can merge it with other mixtrals and see what coming out of it
16
+ Based on [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
17
 
18
  ## UNA Details
19
  For this model we came out with the most obvious, placing UNA on the router_logit. It does work, but we saw a much better performance on SFT by doing so.
 
70
  |pubmedqa |Yaml |none | 0|acc |0.7920|± |0.0182|
71
  |sciq |Yaml |none | 0|acc |0.9630|± |0.0060|
72
  | | |none | 0|acc_norm |0.9370|± |0.0077|
73
+ ```