fblgit
/

UNAversal-8x7B-v1beta

Text Generation

text-generation-inference

Model card Files Files and versions Community

fblgit commited on Dec 26, 2023

Commit

71938a3

·

1 Parent(s): 9225a9c

Update README.md

Files changed (1) hide show

README.md +13 -1

README.md CHANGED Viewed

@@ -1,7 +1,19 @@
 # UNAversal - Uniform Neural Alignment (MoE)
 This is just a beta, a first release so people can start working on franksteins and so.
 It does achieve high GSM/Math and TQA, so ideally you can merge it with other mixtrals and see what coming out of it
 ## UNA Details
 For this model we came out with the most obvious, placing UNA on the router_logit. It does work, but we saw a much better performance on SFT by doing so.
@@ -58,4 +70,4 @@ Here there are some, but we also submitted it to the HF eval queue....
 |pubmedqa      |Yaml   |none  |     0|acc       |0.7920|±  |0.0182|
 |sciq          |Yaml   |none  |     0|acc       |0.9630|±  |0.0060|
 |              |       |none  |     0|acc_norm  |0.9370|±  |0.0077|
-```

+---
+license: cc-by-nc-sa-4.0
+language:
+- en
+library_name: transformers
+tags:
+- UNA
+- juanako
+- mixtral
+- MoE
+---
 # UNAversal - Uniform Neural Alignment (MoE)
 This is just a beta, a first release so people can start working on franksteins and so.
 It does achieve high GSM/Math and TQA, so ideally you can merge it with other mixtrals and see what coming out of it
+Based on [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)
 ## UNA Details
 For this model we came out with the most obvious, placing UNA on the router_logit. It does work, but we saw a much better performance on SFT by doing so.
 |pubmedqa      |Yaml   |none  |     0|acc       |0.7920|±  |0.0182|
 |sciq          |Yaml   |none  |     0|acc       |0.9630|±  |0.0060|
 |              |       |none  |     0|acc_norm  |0.9370|±  |0.0077|
+```