fblgit's picture
Update README.md
f12aac9 verified
metadata
license: afl-3.0
library_name: transformers
tags:
  - UNA
  - juanako
datasets:
  - jondurbin/py-dpo-v0.1
  - Replete-AI/code_bagel_hermes-2.5
  - mlabonne/orpo-dpo-mix-40k
model-index:
  - name: UNA-ThePitbull-21.4B-v2
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 77.73
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 91.79
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 68.25
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 78.24
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 87.37
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 63.53
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-ThePitbull-21.4B-v2
          name: Open LLM Leaderboard

UNA-ThePitbull 21.4B v2

Introducing the best LLM in the industry. Nearly as good as a 70B, just a 21.4B based on saltlux/luxia-21.4b-alignment-v1.0 UNA - ThePitbull 21.4B v2

This model has not been poisoned to score high and be useless. We release him becaues its the real deal of EQ & IQ all together in a crazy powerful smart and conversational model.

Quant Versions available at bartowski/UNA-ThePitbull-21.4B-v2-GGUF

Difference V1 vs V2

On V2 we implemented a different UNA strategy and covered partially the MLP's and Attention Layers. We also performed further SFT over V1 and further DPO over V1 and we'll release some of those soon as well.

Changes

  1. SFT over V1 with Replete-AI/code_bagel_hermes-2.5 at 1.0e-4 till 5.0e-5 for 1 epoch
  2. DPO with: 1.0e-4 to min_lr 5.0e-5 for 1 epoch
  • mlabonne/orpo-dpo-mix-40k
  • jondurbin/py-dpo-v0.1

Evaluations

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 77.82
AI2 Reasoning Challenge (25-Shot) 77.73
HellaSwag (10-Shot) 91.79
MMLU (5-Shot) 68.25
TruthfulQA (0-shot) 78.24
Winogrande (5-shot) 87.37
GSM8k (5-shot) 63.53

Can only be compared with its non-una base model: the original luxia-21.4b and ThePitbull-v1

UNA v2 (VLLM) Evaluations:

vllm (pretrained=/data/tools/mergekit/una-thepitbull-v5,dtype=bfloat16,gpu_memory_utilization=0.8,max_model_len=2048,data_parallel_size=2,tensor_parallel_size=4), gen_kwargs: (None), limit: None, num_fewshot: None, batch_size: 8
|    Tasks     |Version|     Filter     |n-shot|  Metric   |Value |   |Stderr|
|--------------|------:|----------------|-----:|-----------|-----:|---|-----:|
|gsm8k         |      3|strict-match    |     5|exact_match|0.7695|±  |0.0116|+
|              |       |flexible-extract|     5|exact_match|0.7695|±  |0.0116|+
|hellaswag     |      1|none            |    10|acc        |0.8110|±  |0.0039|
|              |       |none            |    10|acc_norm   |0.9169|±  |0.0028|+
|winogrande    |      1|none            |     5|acc        |0.8777|±  |0.0092|+
|mmlu          |N/A    |none            |     0|acc        |0.6427|±  |0.0038|-
|arc_challenge |      1|none            |    25|acc        |0.7713|±  |0.0123|
|              |       |none            |    25|acc_norm   |0.7875|±  |0.0120|+
|truthfulqa_mc2|      2|none            |     0|acc        |0.7824|±  |0.0135|-
|mathqa        |      1|none            |     0|acc        |0.4037|±  | 0.009|
|              |       |none            |     0|acc_norm   |0.4034|±  | 0.009|+
|pubmedqa      |      1|none            |     0|acc        |0.7260|±  | 0.020|+
|boolq         |      2|none            |     0|acc        |0.8602|±  |0.0061|+

UNA v1 (VLLM) Evaluations

|    Tasks     |Version|     Filter     |n-shot|  Metric   |Value |   |Stderr|
|--------------|------:|----------------|-----:|-----------|-----:|---|-----:|
|gsm8k         |      3|strict-match    |     5|exact_match|0.7566|±  |0.0118|
|              |       |flexible-extract|     5|exact_match|0.7582|±  |0.0118|
|hellaswag     |      1|none            |    10|acc        |0.8168|±  |0.0039|
|              |       |none            |    10|acc_norm   |0.9188|±  |0.0027|
|winogrande    |      1|none            |     5|acc        |0.8635|±  |0.0097|
|mmlu          |    N/A|none            |     0|acc        |0.6444|±  |0.0038|
|arc_challenge |      1|none            |    25|acc        |0.7747|±  |0.0122|
|              |       |none            |    25|acc_norm   |0.7850|±  |0.0120|
|truthfulqa_mc2|      2|none            |     0|acc        |0.7902|±  |0.0134|
|mathqa        |      1|none            |     0|acc        |0.4030|±  | 0.009|
|              |       |none            |     0|acc_norm   |0.4034|±  | 0.009|
|pubmedqa      |      1|none            |     0|acc        |0.6860|±  |0.0208|
|boolq         |      2|none            |     0|acc        |0.8401|±  |0.0064|

Original (VLLM) Evaluations

|    Tasks     |Version|     Filter     |n-shot|  Metric   |Value |   |Stderr|
|--------------|------:|----------------|-----:|-----------|-----:|---|-----:|
|gsm8k         |      3|strict-match    |     5|exact_match|0.7528|±  |0.0119|
|              |       |flexible-extract|     5|exact_match|0.7521|±  |0.0119|
|hellaswag     |      1|none            |    10|acc        |0.8117|±  |0.0039|
|              |       |none            |    10|acc_norm   |0.9167|±  |0.0028|
|winogrande    |      1|none            |     5|acc        |0.8682|±  |0.0095|
|mmlu          |    N/A|none            |     0|acc        |0.6448|±  |0.0038|
|arc_challenge |      1|none            |    25|acc        |0.7688|±  |0.0123|
|              |       |none            |    25|acc_norm   |0.7730|±  |0.0122|
|truthfulqa_mc2|      2|none            |     0|acc        |0.7895|±  |0.0133|
|mathqa        |      1|none            |     0|acc        |0.4000|±  | 0.009|
|              |       |none            |     0|acc_norm   |0.4003|±  | 0.009|
|pubmedqa      |      1|none            |     0|acc        |0.6680|±  |0.0211|
|boolq         |      2|none            |     0|acc        |0.8346|±  |0.0065|

Citations

  • mlabonne
  • jondurbin & Replete-AI
  • bartowski
  • saltlux

If you use UNA models dont forget to cite:

@misc{unathepitbull21b,
  title={ThePitbull: Uniform Neural Alignment}, 
  author={Xavier Murias},
  year={2024},
  publisher = {Juanako.AI},
  journal = {HuggingFace repository},
  howpublished = {\url{https://huggingface.co/fblgit/UNA-ThePitbull-21.4-v1}},
}