fblgit's picture
Update README.md
eb838c2 verified
|
raw
history blame
5.19 kB
metadata
license: apache-2.0
tags:
  - UNA
  - simple-math
  - juanako
base_model: abacusai/Smaug-34B-v0.1
datasets:
  - fblgit/simple-math
  - jondurbin/bagel-v0.3
model-index:
  - name: UNA-SimpleSmaug-34b-v1beta
    results:
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: AI2 Reasoning Challenge (25-Shot)
          type: ai2_arc
          config: ARC-Challenge
          split: test
          args:
            num_few_shot: 25
        metrics:
          - type: acc_norm
            value: 74.57
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-SimpleSmaug-34b-v1beta
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: HellaSwag (10-Shot)
          type: hellaswag
          split: validation
          args:
            num_few_shot: 10
        metrics:
          - type: acc_norm
            value: 86.74
            name: normalized accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-SimpleSmaug-34b-v1beta
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: MMLU (5-Shot)
          type: cais/mmlu
          config: all
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 76.68
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-SimpleSmaug-34b-v1beta
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: TruthfulQA (0-shot)
          type: truthful_qa
          config: multiple_choice
          split: validation
          args:
            num_few_shot: 0
        metrics:
          - type: mc2
            value: 70.17
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-SimpleSmaug-34b-v1beta
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: Winogrande (5-shot)
          type: winogrande
          config: winogrande_xl
          split: validation
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 83.82
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-SimpleSmaug-34b-v1beta
          name: Open LLM Leaderboard
      - task:
          type: text-generation
          name: Text Generation
        dataset:
          name: GSM8k (5-shot)
          type: gsm8k
          config: main
          split: test
          args:
            num_few_shot: 5
        metrics:
          - type: acc
            value: 72.48
            name: accuracy
        source:
          url: >-
            https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=fblgit/UNA-SimpleSmaug-34b-v1beta
          name: Open LLM Leaderboard

UNA-SimpleSmaug-34b-v1beta

Scoring 04-February-2024 #1 34B model, outperforming its original base model Smaug-34B-v0.1 with 77.41 😎 Oh, btw.. this one went thru SFT so the abacus inside Smaug is back to normal.. so you can further train/dpo him .. RESET!

UNA Applied UNA only on the Attention, not on the MLP's

  • Is based on Smaug
  • SimpleMath dataset
  • It was trained on Axolotl

Experiment

The thing here is to understand whats the impact of SimpleMath applied at the attention layer during a SFT session and how it impacts on the neural network overall.

Results: Improving mathematican and reasoning capabilities without degrading and presserving previous training sessions.

And enjoy our ModelSimilarities tool detector https://github.com/fblgit/model-similarity where we confirmed numerically the bloodties of the model.

Evals

Pending, but so far this one

|    Task     |Version| Metric |Value            |
|-------------|------:|--------|----------------:|
|arc_challenge|     HF|acc_norm| 0.7457337883959 |
|gsm8k        |     HF|acc     | 0.7247915087187 |
|mmlu         |     HF|acc     | 0.7649553475572 |
|mmlu         |     HF|acc_norm| 0.7681713551647 |
|hellaswag    |     HF|acc_norm| 0.8673571001792 | 
|truthfulqa   |     HF|mc2     | 0.7016557407771 |
|winogrande   |     HF|acc     | 0.8382004735595 |
|------------------------------------------------|

Increasing GSM, MMLU, ARC, WINO.

Citations

To abacusai for making Smaug-34B, the Bagel, and all the magic behind the base model.

If you use the model, provide citation even for merges or anything.

Open LLM Leaderboard Evaluation Results

Detailed results can be found here

Metric Value
Avg. 77.41
AI2 Reasoning Challenge (25-Shot) 74.57
HellaSwag (10-Shot) 86.74
MMLU (5-Shot) 76.68
TruthfulQA (0-shot) 70.17
Winogrande (5-shot) 83.82
GSM8k (5-shot) 72.48