# UNAversal - Uniform Neural Alignment (MoE) This is just a beta, a first release so people can start working on franksteins and so. It does achieve high GSM/Math and TQA, so ideally you can merge it with other mixtrals and see what coming out of it ## UNA Details For this model we came out with the most obvious, placing UNA on the router_logit. It does work, but we saw a much better performance on SFT by doing so. So this model DOES have UNA-SFT phase, its highly experimental and it was merely using LLaMA-Factory datasets by example alpaca. As the others: - Can be finetuned further, try 2e-5 or **1e-4 (since its MOE)** - Can be merged, here you will have to improvise and please report findings on a discussion thread. **REMINDER**: please.. cite, it does help on the research and the lab itself, seriously. ## NEED YOUR HELP!! I need a multi-turn trainloop for the Mixtral, that can squeeze the juice out of 8xH100's properly. Please feel free to reach @fblgit either discord or twitter. thanks! # Evals Here there are some, but we also submitted it to the HF eval queue.... ## GSM8k 5-Shot ``` |Tasks|Version| Filter |n-shot| Metric |Value | |Stderr| |-----|-------|----------|-----:|-----------|-----:|---|-----:| |gsm8k|Yaml |get-answer| 5|exact_match|0.6603|± | 0.013| ``` ## ARC 25-Shot ``` | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |-------------|-------|------|-----:|--------|-----:|---|-----:| |arc_challenge|Yaml |none | 25|acc |0.6621|± |0.0138| | | |none | 25|acc_norm|0.6962|± |0.0134| ``` ## TruthfulQA 0-Shot (MC2) ``` | Tasks |Version|Filter|n-shot|Metric|Value | |Stderr| |--------------|-------|------|-----:|------|-----:|---|-----:| |truthfulqa_mc2|Yaml |none | 0|acc |0.7122|± |0.0141| ``` ## 0-Shots Evals ``` | Tasks |Version|Filter|n-shot| Metric |Value | |Stderr| |--------------|-------|------|-----:|----------|-----:|---|-----:| |arc_challenge |Yaml |none | 0|acc |0.6101|± |0.0143| | | |none | 0|acc_norm |0.6425|± |0.0140| |arc_easy |Yaml |none | 0|acc |0.8615|± |0.0071| | | |none | 0|acc_norm |0.8375|± |0.0076| |boolq |Yaml |none | 0|acc |0.8624|± |0.0060| |lambada_openai|Yaml |none | 0|perplexity|2.8318|± |0.0507| | | |none | 0|acc |0.7650|± |0.0059| |mathqa |Yaml |none | 0|acc |0.4472|± |0.0091| | | |none | 0|acc_norm |0.4436|± |0.0091| |piqa |Yaml |none | 0|acc |0.8292|± |0.0088| | | |none | 0|acc_norm |0.8422|± |0.0085| |pubmedqa |Yaml |none | 0|acc |0.7920|± |0.0182| |sciq |Yaml |none | 0|acc |0.9630|± |0.0060| | | |none | 0|acc_norm |0.9370|± |0.0077| ```