File size: 10,429 Bytes

---
language:
- en
license: cc-by-nc-sa-4.0
library_name: transformers
tags:
- UNA
- juanako
- mixtral
- MoE
model-index:
- name: UNAversal-8x7B-v1beta
  results:
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: AI2 Reasoning Challenge (25-Shot)
      type: ai2_arc
      config: ARC-Challenge
      split: test
      args:
        num_few_shot: 25
    metrics:
    - type: acc_norm
      value: 69.8
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNAversal-8x7B-v1beta
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: HellaSwag (10-Shot)
      type: hellaswag
      split: validation
      args:
        num_few_shot: 10
    metrics:
    - type: acc_norm
      value: 86.9
      name: normalized accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNAversal-8x7B-v1beta
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: MMLU (5-Shot)
      type: cais/mmlu
      config: all
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 70.39
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNAversal-8x7B-v1beta
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: TruthfulQA (0-shot)
      type: truthful_qa
      config: multiple_choice
      split: validation
      args:
        num_few_shot: 0
    metrics:
    - type: mc2
      value: 71.97
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNAversal-8x7B-v1beta
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: Winogrande (5-shot)
      type: winogrande
      config: winogrande_xl
      split: validation
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 82.0
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNAversal-8x7B-v1beta
      name: Open LLM Leaderboard
  - task:
      type: text-generation
      name: Text Generation
    dataset:
      name: GSM8k (5-shot)
      type: gsm8k
      config: main
      split: test
      args:
        num_few_shot: 5
    metrics:
    - type: acc
      value: 61.64
      name: accuracy
    source:
      url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNAversal-8x7B-v1beta
      name: Open LLM Leaderboard
---
# UNAversal - Uniform Neural Alignment (MoE)

This is just a beta, a first release so people can start working on franksteins and so.
It does achieve high GSM/Math and TQA, so ideally you can merge it with other mixtrals and see what coming out of it
Based on [mistralai/Mixtral-8x7B-Instruct-v0.1](https://huggingface.co/mistralai/Mixtral-8x7B-Instruct-v0.1)

## UNA Details
For this model we came out with the most obvious, placing UNA on the router_logit. It does work, but we saw a much better performance on SFT by doing so.
So this model DOES have UNA-SFT phase, its highly experimental and it was merely using LLaMA-Factory datasets by example alpaca.

As the others:
- Can be finetuned further, try 2e-5 or **1e-4 (since its MOE)**
- Can be merged, here you will have to improvise and please report findings on a discussion thread.

**REMINDER**: please.. cite, it does help on the research and the lab itself, seriously.

## NEED YOUR HELP!!
I need a multi-turn trainloop for the Mixtral, that can squeeze the juice out of 8xH100's properly. Please feel free to reach @fblgit either discord or twitter. thanks!

# Evals
Here there are some, but we also submitted it to the HF eval queue....

## GSM8k 5-Shot
```
|Tasks|Version|  Filter  |n-shot|  Metric   |Value |   |Stderr|
|-----|-------|----------|-----:|-----------|-----:|---|-----:|
|gsm8k|Yaml   |get-answer|     5|exact_match|0.6603|±  | 0.013|
```
## ARC 25-Shot
```
|    Tasks    |Version|Filter|n-shot| Metric |Value |   |Stderr|
|-------------|-------|------|-----:|--------|-----:|---|-----:|
|arc_challenge|Yaml   |none  |    25|acc     |0.6621|±  |0.0138|
|             |       |none  |    25|acc_norm|0.6962|±  |0.0134|
```

## TruthfulQA 0-Shot (MC2)
```
|    Tasks     |Version|Filter|n-shot|Metric|Value |   |Stderr|
|--------------|-------|------|-----:|------|-----:|---|-----:|
|truthfulqa_mc2|Yaml   |none  |     0|acc   |0.7122|±  |0.0141|
```

## 0-Shots Evals
```
|    Tasks     |Version|Filter|n-shot|  Metric  |Value |   |Stderr|
|--------------|-------|------|-----:|----------|-----:|---|-----:|
|arc_challenge |Yaml   |none  |     0|acc       |0.6101|±  |0.0143|
|              |       |none  |     0|acc_norm  |0.6425|±  |0.0140|
|arc_easy      |Yaml   |none  |     0|acc       |0.8615|±  |0.0071|
|              |       |none  |     0|acc_norm  |0.8375|±  |0.0076|
|boolq         |Yaml   |none  |     0|acc       |0.8624|±  |0.0060|
|lambada_openai|Yaml   |none  |     0|perplexity|2.8318|±  |0.0507|
|              |       |none  |     0|acc       |0.7650|±  |0.0059|
|mathqa        |Yaml   |none  |     0|acc       |0.4472|±  |0.0091|
|              |       |none  |     0|acc_norm  |0.4436|±  |0.0091|
|piqa          |Yaml   |none  |     0|acc       |0.8292|±  |0.0088|
|              |       |none  |     0|acc_norm  |0.8422|±  |0.0085|
|pubmedqa      |Yaml   |none  |     0|acc       |0.7920|±  |0.0182|
|sciq          |Yaml   |none  |     0|acc       |0.9630|±  |0.0060|
|              |       |none  |     0|acc_norm  |0.9370|±  |0.0077|
```

## BBH at 0-Shot
```
vllm (pretrained=fblgit/UNAversal-8x7B-v1beta,tensor_parallel_size=2,data_parallel_size=4,gpu_memory_utilization=0.8,dtype=float16), gen_kwargs: (None), limit: None, num_fewshot: 0, batch_size: auto
|                          Tasks                           |Version|  Filter  |n-shot|  Metric   |Value |   |Stderr|
|----------------------------------------------------------|-------|----------|-----:|-----------|-----:|---|-----:|
|bbh                                                       |N/A    |get-answer|     0|exact_match|0.6752|±  |0.1772|
| - bbh_cot_fewshot_boolean_expressions                    |Yaml   |get-answer|     0|exact_match|0.8840|±  |0.0203|
| - bbh_cot_fewshot_causal_judgement                       |Yaml   |get-answer|     0|exact_match|0.6417|±  |0.0352|
| - bbh_cot_fewshot_date_understanding                     |Yaml   |get-answer|     0|exact_match|0.7600|±  |0.0271|
| - bbh_cot_fewshot_disambiguation_qa                      |Yaml   |get-answer|     0|exact_match|0.7160|±  |0.0286|
| - bbh_cot_fewshot_dyck_languages                         |Yaml   |get-answer|     0|exact_match|0.1800|±  |0.0243|
| - bbh_cot_fewshot_formal_fallacies                       |Yaml   |get-answer|     0|exact_match|0.6520|±  |0.0302|
| - bbh_cot_fewshot_geometric_shapes                       |Yaml   |get-answer|     0|exact_match|0.3880|±  |0.0309|
| - bbh_cot_fewshot_hyperbaton                             |Yaml   |get-answer|     0|exact_match|0.9600|±  |0.0124|
| - bbh_cot_fewshot_logical_deduction_five_objects         |Yaml   |get-answer|     0|exact_match|0.5360|±  |0.0316|
| - bbh_cot_fewshot_logical_deduction_seven_objects        |Yaml   |get-answer|     0|exact_match|0.5040|±  |0.0317|
| - bbh_cot_fewshot_logical_deduction_three_objects        |Yaml   |get-answer|     0|exact_match|0.8600|±  |0.0220|
| - bbh_cot_fewshot_movie_recommendation                   |Yaml   |get-answer|     0|exact_match|0.7840|±  |0.0261|
| - bbh_cot_fewshot_multistep_arithmetic_two               |Yaml   |get-answer|     0|exact_match|0.6600|±  |0.0300|
| - bbh_cot_fewshot_navigate                               |Yaml   |get-answer|     0|exact_match|0.8160|±  |0.0246|
| - bbh_cot_fewshot_object_counting                        |Yaml   |get-answer|     0|exact_match|0.8360|±  |0.0235|
| - bbh_cot_fewshot_penguins_in_a_table                    |Yaml   |get-answer|     0|exact_match|0.7329|±  |0.0367|
| - bbh_cot_fewshot_reasoning_about_colored_objects        |Yaml   |get-answer|     0|exact_match|0.8120|±  |0.0248|
| - bbh_cot_fewshot_ruin_names                             |Yaml   |get-answer|     0|exact_match|0.4440|±  |0.0315|
| - bbh_cot_fewshot_salient_translation_error_detection    |Yaml   |get-answer|     0|exact_match|0.5200|±  |0.0317|
| - bbh_cot_fewshot_snarks                                 |Yaml   |get-answer|     0|exact_match|0.7135|±  |0.0340|
| - bbh_cot_fewshot_sports_understanding                   |Yaml   |get-answer|     0|exact_match|0.9400|±  |0.0151|
| - bbh_cot_fewshot_temporal_sequences                     |Yaml   |get-answer|     0|exact_match|0.7560|±  |0.0272|
| - bbh_cot_fewshot_tracking_shuffled_objects_five_objects |Yaml   |get-answer|     0|exact_match|0.5680|±  |0.0314|
| - bbh_cot_fewshot_tracking_shuffled_objects_seven_objects|Yaml   |get-answer|     0|exact_match|0.6280|±  |0.0306|
| - bbh_cot_fewshot_tracking_shuffled_objects_three_objects|Yaml   |get-answer|     0|exact_match|0.6280|±  |0.0306|
| - bbh_cot_fewshot_web_of_lies                            |Yaml   |get-answer|     0|exact_match|0.9560|±  |0.0130|
| - bbh_cot_fewshot_word_sorting                           |Yaml   |get-answer|     0|exact_match|0.3800|±  |0.0308|

|Groups|Version|  Filter  |n-shot|  Metric   |Value |   |Stderr|
|------|-------|----------|-----:|-----------|-----:|---|-----:|
|bbh   |N/A    |get-answer|     0|exact_match|0.6752|±  |0.1772|
```
# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_fblgit__UNAversal-8x7B-v1beta)

|             Metric              |Value|
|---------------------------------|----:|
|Avg.                             |73.78|
|AI2 Reasoning Challenge (25-Shot)|69.80|
|HellaSwag (10-Shot)              |86.90|
|MMLU (5-Shot)                    |70.39|
|TruthfulQA (0-shot)              |71.97|
|Winogrande (5-shot)              |82.00|
|GSM8k (5-shot)                   |61.64|