Edit model card

UNAversal - Uniform Neural Alignment (MoE)

This is just a beta, a first release so people can start working on franksteins and so. It does achieve high GSM/Math and TQA, so ideally you can merge it with other mixtrals and see what coming out of it Based on mistralai/Mixtral-8x7B-Instruct-v0.1

UNA Details

For this model we came out with the most obvious, placing UNA on the router_logit. It does work, but we saw a much better performance on SFT by doing so. So this model DOES have UNA-SFT phase, its highly experimental and it was merely using LLaMA-Factory datasets by example alpaca.

As the others:

  • Can be finetuned further, try 2e-5 or 1e-4 (since its MOE)
  • Can be merged, here you will have to improvise and please report findings on a discussion thread.

REMINDER: please.. cite, it does help on the research and the lab itself, seriously.

NEED YOUR HELP!!

I need a multi-turn trainloop for the Mixtral, that can squeeze the juice out of 8xH100's properly. Please feel free to reach @fblgit either discord or twitter. thanks!

Evals

Here there are some, but we also submitted it to the HF eval queue....

GSM8k 5-Shot

|Tasks|Version|  Filter  |n-shot|  Metric   |Value |   |Stderr|
|-----|-------|----------|-----:|-----------|-----:|---|-----:|
|gsm8k|Yaml   |get-answer|     5|exact_match|0.6603|±  | 0.013|

ARC 25-Shot

|    Tasks    |Version|Filter|n-shot| Metric |Value |   |Stderr|
|-------------|-------|------|-----:|--------|-----:|---|-----:|
|arc_challenge|Yaml   |none  |    25|acc     |0.6621|±  |0.0138|
|             |       |none  |    25|acc_norm|0.6962|±  |0.0134|

TruthfulQA 0-Shot (MC2)

|    Tasks     |Version|Filter|n-shot|Metric|Value |   |Stderr|
|--------------|-------|------|-----:|------|-----:|---|-----:|
|truthfulqa_mc2|Yaml   |none  |     0|acc   |0.7122|±  |0.0141|

0-Shots Evals

|    Tasks     |Version|Filter|n-shot|  Metric  |Value |   |Stderr|
|--------------|-------|------|-----:|----------|-----:|---|-----:|
|arc_challenge |Yaml   |none  |     0|acc       |0.6101|±  |0.0143|
|              |       |none  |     0|acc_norm  |0.6425|±  |0.0140|
|arc_easy      |Yaml   |none  |     0|acc       |0.8615|±  |0.0071|
|              |       |none  |     0|acc_norm  |0.8375|±  |0.0076|
|boolq         |Yaml   |none  |     0|acc       |0.8624|±  |0.0060|
|lambada_openai|Yaml   |none  |     0|perplexity|2.8318|±  |0.0507|
|              |       |none  |     0|acc       |0.7650|±  |0.0059|
|mathqa        |Yaml   |none  |     0|acc       |0.4472|±  |0.0091|
|              |       |none  |     0|acc_norm  |0.4436|±  |0.0091|
|piqa          |Yaml   |none  |     0|acc       |0.8292|±  |0.0088|
|              |       |none  |     0|acc_norm  |0.8422|±  |0.0085|
|pubmedqa      |Yaml   |none  |     0|acc       |0.7920|±  |0.0182|
|sciq          |Yaml   |none  |     0|acc       |0.9630|±  |0.0060|
|              |       |none  |     0|acc_norm  |0.9370|±  |0.0077|

BBH at 0-Shot

vllm (pretrained=fblgit/UNAversal-8x7B-v1beta,tensor_parallel_size=2,data_parallel_size=4,gpu_memory_utilization=0.8,dtype=float16), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: auto
|                          Tasks                           |Version|  Filter  |n-shot|  Metric   |Value |   |Stderr|
|----------------------------------------------------------|-------|----------|-----:|-----------|-----:|---|-----:|
|bbh                                                       |N/A    |get-answer|     0|exact_match|0.6752|±  |0.1772|
| - bbh_cot_fewshot_boolean_expressions                    |Yaml   |get-answer|     0|exact_match|0.8840|±  |0.0203|
| - bbh_cot_fewshot_causal_judgement                       |Yaml   |get-answer|     0|exact_match|0.6417|±  |0.0352|
| - bbh_cot_fewshot_date_understanding                     |Yaml   |get-answer|     0|exact_match|0.7600|±  |0.0271|
| - bbh_cot_fewshot_disambiguation_qa                      |Yaml   |get-answer|     0|exact_match|0.7160|±  |0.0286|
| - bbh_cot_fewshot_dyck_languages                         |Yaml   |get-answer|     0|exact_match|0.1800|±  |0.0243|
| - bbh_cot_fewshot_formal_fallacies                       |Yaml   |get-answer|     0|exact_match|0.6520|±  |0.0302|
| - bbh_cot_fewshot_geometric_shapes                       |Yaml   |get-answer|     0|exact_match|0.3880|±  |0.0309|
| - bbh_cot_fewshot_hyperbaton                             |Yaml   |get-answer|     0|exact_match|0.9600|±  |0.0124|
| - bbh_cot_fewshot_logical_deduction_five_objects         |Yaml   |get-answer|     0|exact_match|0.5360|±  |0.0316|
| - bbh_cot_fewshot_logical_deduction_seven_objects        |Yaml   |get-answer|     0|exact_match|0.5040|±  |0.0317|
| - bbh_cot_fewshot_logical_deduction_three_objects        |Yaml   |get-answer|     0|exact_match|0.8600|±  |0.0220|
| - bbh_cot_fewshot_movie_recommendation                   |Yaml   |get-answer|     0|exact_match|0.7840|±  |0.0261|
| - bbh_cot_fewshot_multistep_arithmetic_two               |Yaml   |get-answer|     0|exact_match|0.6600|±  |0.0300|
| - bbh_cot_fewshot_navigate                               |Yaml   |get-answer|     0|exact_match|0.8160|±  |0.0246|
| - bbh_cot_fewshot_object_counting                        |Yaml   |get-answer|     0|exact_match|0.8360|±  |0.0235|
| - bbh_cot_fewshot_penguins_in_a_table                    |Yaml   |get-answer|     0|exact_match|0.7329|±  |0.0367|
| - bbh_cot_fewshot_reasoning_about_colored_objects        |Yaml   |get-answer|     0|exact_match|0.8120|±  |0.0248|
| - bbh_cot_fewshot_ruin_names                             |Yaml   |get-answer|     0|exact_match|0.4440|±  |0.0315|
| - bbh_cot_fewshot_salient_translation_error_detection    |Yaml   |get-answer|     0|exact_match|0.5200|±  |0.0317|
| - bbh_cot_fewshot_snarks                                 |Yaml   |get-answer|     0|exact_match|0.7135|±  |0.0340|
| - bbh_cot_fewshot_sports_understanding                   |Yaml   |get-answer|     0|exact_match|0.9400|±  |0.0151|
| - bbh_cot_fewshot_temporal_sequences                     |Yaml   |get-answer|     0|exact_match|0.7560|±  |0.0272|
| - bbh_cot_fewshot_tracking_shuffled_objects_five_objects |Yaml   |get-answer|     0|exact_match|0.5680|±  |0.0314|
| - bbh_cot_fewshot_tracking_shuffled_objects_seven_objects|Yaml   |get-answer|     0|exact_match|0.6280|±  |0.0306|
| - bbh_cot_fewshot_tracking_shuffled_objects_three_objects|Yaml   |get-answer|     0|exact_match|0.6280|±  |0.0306|
| - bbh_cot_fewshot_web_of_lies                            |Yaml   |get-answer|     0|exact_match|0.9560|±  |0.0130|
| - bbh_cot_fewshot_word_sorting                           |Yaml   |get-answer|     0|exact_match|0.3800|±  |0.0308|

|Groups|Version|  Filter  |n-shot|  Metric   |Value |   |Stderr|
|------|-------|----------|-----:|-----------|-----:|---|-----:|
|bbh   |N/A    |get-answer|     0|exact_match|0.6752|±  |0.1772|

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 73.78
AI2 Reasoning Challenge (25-Shot) 69.80
HellaSwag (10-Shot) 86.90
MMLU (5-Shot) 70.39
TruthfulQA (0-shot) 71.97
Winogrande (5-shot) 82.00
GSM8k (5-shot) 61.64
Downloads last month
1,036
Safetensors
Model size
46.7B params
Tensor type
BF16
·
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Evaluation results