fblgit
/

juanako-7b-UNA

@@ -21,7 +21,8 @@ model-index:
       split: validation
     metrics:
       - type: accuracy
-        value: 65.49
   - task:
       type: text-generation
       name: ARC-Challenge
@@ -32,7 +33,8 @@ model-index:
       split: test
     metrics:
       - type: accuracy
-        value: 68.09
   - task:
       type: text-generation
       name: HellaSwag
@@ -42,18 +44,8 @@ model-index:
       split: test
     metrics:
       - type: accuracy
-        value: 85.20
-  - task:
-      type: text-generation
-      name: GSM8k
-    dataset:
-      type: text-generation
-      name: gsm8k
-      config: main
-      split: test
-    metrics:
-      - type: accuracy
-        value: 48.98
   - task:
       type: text-generation
       name: Winogrande
@@ -64,7 +56,8 @@ model-index:
       split: test
     metrics:
       - type: accuracy
-        value: 76.8
   - task:
       type: text-generation
       name: MMLU
@@ -75,7 +68,8 @@ model-index:
       split: test
     metrics:
       - type: accuracy
-        value: 61.37
   - task:
       type: text-generation
       name: PiQA
@@ -95,7 +89,8 @@ model-index:
       split: validation
     metrics:
       - type: accuracy
-        value: 49.8
   - task:
       type: text-generation
       name: PubMedQA
@@ -109,28 +104,23 @@ model-index:
         value: 76.0
 ---
-# juanako-7b-UNA-v2
 This model is a fine-tuned version of [fblgit/juanako-7b-UNA-v2-phase-1](https://huggingface.co/fblgit/juanako-7b-UNA-v2-phase-1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 It outperforms in many aspects most of the current Mistral based models and is the **latest and most powerful juanako version as of now**.
-## Scoring and records (26-November-2023)
-Here are some results:
-* Scores #1 7B Model
-* Scores #4 GSM8k
-* Scores #2 in TruthfulQA
-* Scores #6 in CoPa
-* Scores #2 in PiQA
-* Scores #9 in BoolQ
 | Model | Average ⬆️| ARC (25-s) ⬆️ | HellaSwag (10-s) ⬆️ | MMLU (5-s) ⬆️| TruthfulQA (MC) (0-s) ⬆️ | Winogrande (5-s) | GSM8K (5-s) | DROP (3-s) |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 |[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 50.32 | 59.58  | 83.31  | 64.16  | 42.15 | 78.37 | 18.12 | 6.14 |
 | [Intel/neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1) | 59.0 | 66.21 | 83.64 | 62.37  | 59.65 | 78.14 | 19.56 | 43.84 |
-| [fblgit/juanako-7b-UNA](https://huggingface.co/fblgit/juanako-7b-UNA) | **65.10** | **68.09** | **85.20** | 61.37  | **65.49** | 76.8 | **48.98** | **49.8** |
-Many evaluations were performed, but it behaves very balanced in multiple fields. Feel free to submit more evaluation results.
-It scores: **65.1** according HuggingFace LLM Leaderboard.
 Author [Xavier M.](mailto:xavi@juanako.ai) @fblgit
@@ -138,33 +128,68 @@ Author [Xavier M.](mailto:xavi@juanako.ai) @fblgit
 juanako uses UNA, Uniform Neural Alignment. A training technique that ease alignment between transformer layers yet to be published.
-## TruthfulQA 0-Shot
 ```
 |    Tasks     |Version|Filter|Metric|Value |   |Stderr|
 |--------------|-------|------|------|-----:|---|-----:|
 |truthfulqa_mc2|Yaml   |none  |acc   |0.6549|±  |0.0153|
 ```
-## ARC 25-Shot
 ```
 |    Tasks    |Version|Filter| Metric |Value |   |Stderr|
 |-------------|-------|------|--------|-----:|---|-----:|
 |arc_challenge|Yaml   |none  |acc     |0.6476|±  |0.0140|
 |             |       |none  |acc_norm|0.6809|±  |0.0136|
 ```
-## HellaSwag 10-Shot
 ```
 |  Tasks  |Version|Filter| Metric |Value |   |Stderr|
 |---------|-------|------|--------|-----:|---|-----:|
 |hellaswag|Yaml   |none  |acc     |0.6703|±  |0.0047|
 |         |       |none  |acc_norm|0.8520|±  |0.0035|
 ```
-## GSM8k 5-Shot
 ```
 |Tasks|Version|  Filter  |  Metric   |Value |   |Stderr|
 |-----|-------|----------|-----------|-----:|---|-----:|
 |gsm8k|Yaml   |get-answer|exact_match|0.4898|±  |0.0138|
 ```
-## GPT Evaluations 0-Shot
 ```
 |    Tasks     |Version|Filter|  Metric  |Value |   |Stderr|
 |--------------|-------|------|----------|-----:|---|-----:|
@@ -176,39 +201,39 @@ juanako uses UNA, Uniform Neural Alignment. A training technique that ease align
 |sciq          |Yaml   |none  |acc       |0.9580|±  |0.0063|
 |              |       |none  |acc_norm  |0.9130|±  |0.0089|
 ```
-## MathQA 0-Shot
 ```
 |Tasks |Version|Filter| Metric |Value |   |Stderr|
 |------|-------|------|--------|-----:|---|-----:|
 |mathqa|Yaml   |none  |acc     |0.3752|±  |0.0089|
 |      |       |none  |acc_norm|0.3772|±  |0.0089|
 ```
-## PiQa 1-Shot
 ```
 |Tasks|Version|Filter| Metric |Value |   |Stderr|
 |-----|-------|------|--------|-----:|---|-----:|
 |piqa |Yaml   |none  |acc     |0.8308|±  |0.0087|
 |     |       |none  |acc_norm|0.8357|±  |0.0086|
 ```
-## Winogrande 5-Shot
 ```
 |  Tasks   |Version|Filter|Metric|Value|   |Stderr|
 |----------|-------|------|------|----:|---|-----:|
 |winogrande|Yaml   |none  |acc   |0.768|±  |0.0119|
 ```
-## PubMedQA 0-Shot
 ```
 | Tasks  |Version|Filter|Metric|Value|   |Stderr|
 |--------|-------|------|------|----:|---|-----:|
 |pubmedqa|Yaml   |none  |acc   | 0.76|±  |0.0191|
 ```
-## RACE 1-Shot
 ```
 |Tasks|Version|Filter|Metric|Value |   |Stderr|
 |-----|-------|------|------|-----:|---|-----:|
 |race |Yaml   |none  |acc   |0.5282|±  |0.0154|
 ```
-## MMLU 5-Shot (8-Bit)
 ```
 |      Groups      |Version|Filter|Metric|Value |   |Stderr|
 |------------------|-------|------|------|-----:|---|-----:|
@@ -218,19 +243,22 @@ juanako uses UNA, Uniform Neural Alignment. A training technique that ease align
 | - social_sciences|N/A    |none  |acc   |0.7195|±  |0.0713|
 | - stem           |N/A    |none  |acc   |0.5087|±  |0.1297|
 ```
-## DROP 3-Shot (8-Bit) (Instruct-Eval)
 ```
 {'score': 0.49801113762927607}
 {'drop': 49.8}
 drop: 49.8
 ```
-## CRASS 0-Shot (Instruct-Eval)
 ```
 {'score': 0.8357664233576643}
 {'crass': 83.58}
 crass: 83.58
 ```
 ### Training hyperparameters
 The following hyperparameters were used during training:
@@ -267,6 +295,7 @@ The following hyperparameters were used during training:
 ## Citations
 If you find juanako useful please:
 ```
 @misc{juanako7buna,
   title={Juanako: Uniform Neural Alignment},
@@ -278,6 +307,7 @@ If you find juanako useful please:
 }
 ```
 ```
 @misc{lin2021truthfulqa,
   title={TruthfulQA: Measuring How Models Mimic Human Falsehoods},
@@ -295,12 +325,6 @@ If you find juanako useful please:
       archivePrefix={arXiv},
       primaryClass={cs.LG}
 }
-@article{cobbe2021gsm8k,
-  title={Training Verifiers to Solve Math Word Problems},
-  author={Cobbe, Karl and Kosaraju, Vineet and Bavarian, Mohammad and Chen, Mark and Jun, Heewoo and Kaiser, Lukasz and Plappert, Matthias and Tworek, Jerry and Hilton, Jacob and Nakano, Reiichiro and Hesse, Christopher and Schulman, John},
-  journal={arXiv preprint arXiv:2110.14168},
-  year={2021}
-}
 @inproceedings{Bisk2020,
   author = {Yonatan Bisk and Rowan Zellers and
             Ronan Le Bras and Jianfeng Gao

       split: validation
     metrics:
       - type: accuracy
+        value: 65.13
+        verified: true
   - task:
       type: text-generation
       name: ARC-Challenge
       split: test
     metrics:
       - type: accuracy
+        value: 68.17
+        verified: true
   - task:
       type: text-generation
       name: HellaSwag
       split: test
     metrics:
       - type: accuracy
+        value: 85.34
+        verified: true
   - task:
       type: text-generation
       name: Winogrande
       split: test
     metrics:
       - type: accuracy
+        value: 78.85
+        verified: true
   - task:
       type: text-generation
       name: MMLU
       split: test
     metrics:
       - type: accuracy
+        value: 62.47
+        verified: true
   - task:
       type: text-generation
       name: PiQA
       split: validation
     metrics:
       - type: accuracy
+        value: 38.74
+        verified: true
   - task:
       type: text-generation
       name: PubMedQA
         value: 76.0
 ---
+# juanako-7b-UNA (Uniform Neural Alignment)
 This model is a fine-tuned version of [fblgit/juanako-7b-UNA-v2-phase-1](https://huggingface.co/fblgit/juanako-7b-UNA-v2-phase-1) on the HuggingFaceH4/ultrafeedback_binarized dataset.
 It outperforms in many aspects most of the current Mistral based models and is the **latest and most powerful juanako version as of now**.
+## Scores
+The official HuggingFace results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/results/blob/main/fblgit/juanako-7b-UNA/results_2023-11-28T08-33-33.965228.json)
 | Model | Average ⬆️| ARC (25-s) ⬆️ | HellaSwag (10-s) ⬆️ | MMLU (5-s) ⬆️| TruthfulQA (MC) (0-s) ⬆️ | Winogrande (5-s) | GSM8K (5-s) | DROP (3-s) |
 | --- | --- | --- | --- | --- | --- | --- | --- | --- |
 |[mistralai/Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1) | 50.32 | 59.58  | 83.31  | 64.16  | 42.15 | 78.37 | 18.12 | 6.14 |
 | [Intel/neural-chat-7b-v3-1](https://huggingface.co/Intel/neural-chat-7b-v3-1) | 59.0 | 66.21 | 83.64 | 62.37  | 59.65 | 78.14 | 19.56 | 43.84 |
+| [fblgit/juanako-7b-UNA](https://huggingface.co/fblgit/juanako-7b-UNA) | **59.91** | **68.17** | **85.34** | 62.47  | **65.13** | **78.85** | **20.7** | 38.74 |
+It scores: **59.91** according HuggingFace LLM Leaderboard.
+It scores: **65.1** with `big-refactor` branch of lm-eval-harness
 Author [Xavier M.](mailto:xavi@juanako.ai) @fblgit
 juanako uses UNA, Uniform Neural Alignment. A training technique that ease alignment between transformer layers yet to be published.
+### Prompts
+The following prompts showed positive results, it may depend the task and needs further experimentation but this should work for starters:
+```
+<|im_start|>system
+- You are a helpful assistant chatbot trained by MosaicML.
+- You answer questions.
+- You are excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.
+- You are more than just an information source, you are also able to write poetry, short stories, and make jokes.<|im_end|>
+<|im_start|>user
+Explain QKV<|im_end|>
+<|im_start|>assistant
+```
+```
+### Assistant: I am StableVicuna, a large language model created by CarperAI. I am here to chat!
+### Human: Explain QKV
+### Assistant:
+```
+```
+[Round <|round|>]
+问：Explain QKV
+答：
+```
+```
+[Round <|round|>]
+Question：Explain QKV
+Answer：
+```
+```
+Question：Explain QKV
+Answer：
+```
+## Evaluations (lm-eval big-refactor branch)
+### TruthfulQA 0-Shot
 ```
 |    Tasks     |Version|Filter|Metric|Value |   |Stderr|
 |--------------|-------|------|------|-----:|---|-----:|
 |truthfulqa_mc2|Yaml   |none  |acc   |0.6549|±  |0.0153|
 ```
+### ARC 25-Shot
 ```
 |    Tasks    |Version|Filter| Metric |Value |   |Stderr|
 |-------------|-------|------|--------|-----:|---|-----:|
 |arc_challenge|Yaml   |none  |acc     |0.6476|±  |0.0140|
 |             |       |none  |acc_norm|0.6809|±  |0.0136|
 ```
+### HellaSwag 10-Shot
 ```
 |  Tasks  |Version|Filter| Metric |Value |   |Stderr|
 |---------|-------|------|--------|-----:|---|-----:|
 |hellaswag|Yaml   |none  |acc     |0.6703|±  |0.0047|
 |         |       |none  |acc_norm|0.8520|±  |0.0035|
 ```
+### GSM8k 5-Shot
 ```
 |Tasks|Version|  Filter  |  Metric   |Value |   |Stderr|
 |-----|-------|----------|-----------|-----:|---|-----:|
 |gsm8k|Yaml   |get-answer|exact_match|0.4898|±  |0.0138|
 ```
+### GPT Evaluations 0-Shot
 ```
 |    Tasks     |Version|Filter|  Metric  |Value |   |Stderr|
 |--------------|-------|------|----------|-----:|---|-----:|
 |sciq          |Yaml   |none  |acc       |0.9580|±  |0.0063|
 |              |       |none  |acc_norm  |0.9130|±  |0.0089|
 ```
+### MathQA 0-Shot
 ```
 |Tasks |Version|Filter| Metric |Value |   |Stderr|
 |------|-------|------|--------|-----:|---|-----:|
 |mathqa|Yaml   |none  |acc     |0.3752|±  |0.0089|
 |      |       |none  |acc_norm|0.3772|±  |0.0089|
 ```
+### PiQa 1-Shot
 ```
 |Tasks|Version|Filter| Metric |Value |   |Stderr|
 |-----|-------|------|--------|-----:|---|-----:|
 |piqa |Yaml   |none  |acc     |0.8308|±  |0.0087|
 |     |       |none  |acc_norm|0.8357|±  |0.0086|
 ```
+### Winogrande 5-Shot
 ```
 |  Tasks   |Version|Filter|Metric|Value|   |Stderr|
 |----------|-------|------|------|----:|---|-----:|
 |winogrande|Yaml   |none  |acc   |0.768|±  |0.0119|
 ```
+### PubMedQA 0-Shot
 ```
 | Tasks  |Version|Filter|Metric|Value|   |Stderr|
 |--------|-------|------|------|----:|---|-----:|
 |pubmedqa|Yaml   |none  |acc   | 0.76|±  |0.0191|
 ```
+### RACE 1-Shot
 ```
 |Tasks|Version|Filter|Metric|Value |   |Stderr|
 |-----|-------|------|------|-----:|---|-----:|
 |race |Yaml   |none  |acc   |0.5282|±  |0.0154|
 ```
+### MMLU 5-Shot (8-Bit)
 ```
 |      Groups      |Version|Filter|Metric|Value |   |Stderr|
 |------------------|-------|------|------|-----:|---|-----:|
 | - social_sciences|N/A    |none  |acc   |0.7195|±  |0.0713|
 | - stem           |N/A    |none  |acc   |0.5087|±  |0.1297|
 ```
+### DROP 3-Shot (8-Bit) (Instruct-Eval)
 ```
 {'score': 0.49801113762927607}
 {'drop': 49.8}
 drop: 49.8
 ```
+### CRASS 0-Shot (Instruct-Eval)
 ```
 {'score': 0.8357664233576643}
 {'crass': 83.58}
 crass: 83.58
 ```
+## Training Details
 ### Training hyperparameters
 The following hyperparameters were used during training:
 ## Citations
 If you find juanako useful please:
 ```
 @misc{juanako7buna,
   title={Juanako: Uniform Neural Alignment},
 }
 ```
+Thanks to all the brilliant humans behind the creation of AI, here some of the ones that we find relevant to our research. If you feel a citation is missing, please contact.
 ```
 @misc{lin2021truthfulqa,
   title={TruthfulQA: Measuring How Models Mimic Human Falsehoods},
       archivePrefix={arXiv},
       primaryClass={cs.LG}
 }
 @inproceedings{Bisk2020,
   author = {Yonatan Bisk and Rowan Zellers and
             Ronan Le Bras and Jianfeng Gao