abacusai
/

Smaug-Llama-3-70B-Instruct

@@ -1,10 +1,157 @@
 ---
-library_name: transformers
 license: llama3
 datasets:
 - aqua_rat
 - microsoft/orca-math-word-problems-200k
 - m-a-p/CodeFeedback-Filtered-Instruction
 ---
 # Smaug-Llama-3-70B-Instruct
@@ -148,4 +295,23 @@ The score for both Llama-3 and this model are significantly different when evalu
 with the updated harness as the issue with stop words has been addressed.
-This version of Smaug uses new techniques and new data compared to [Smaug-72B](https://huggingface.co/abacusai/Smaug-72B-v0.1), and more information will be released later on. For now, see the previous Smaug paper: https://arxiv.org/abs/2402.13228.

 ---
 license: llama3
+library_name: transformers
 datasets:
 - aqua_rat
 - microsoft/orca-math-word-problems-200k
 - m-a-p/CodeFeedback-Filtered-Instruction
+model-index:
+- name: Smaug-Llama-3-70B-Instruct
+  results:
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: ENEM Challenge (No Images)
+      type: eduagarcia/enem_challenge
+      split: train
+      args:
+        num_few_shot: 3
+    metrics:
+    - type: acc
+      value: 77.89
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
+      name: Open Portuguese LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: BLUEX (No Images)
+      type: eduagarcia-temp/BLUEX_without_images
+      split: train
+      args:
+        num_few_shot: 3
+    metrics:
+    - type: acc
+      value: 69.54
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
+      name: Open Portuguese LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: OAB Exams
+      type: eduagarcia/oab_exams
+      split: train
+      args:
+        num_few_shot: 3
+    metrics:
+    - type: acc
+      value: 63.64
+      name: accuracy
+    source:
+      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
+      name: Open Portuguese LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: Assin2 RTE
+      type: assin2
+      split: test
+      args:
+        num_few_shot: 15
+    metrics:
+    - type: f1_macro
+      value: 93.62
+      name: f1-macro
+    source:
+      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
+      name: Open Portuguese LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: Assin2 STS
+      type: eduagarcia/portuguese_benchmark
+      split: test
+      args:
+        num_few_shot: 15
+    metrics:
+    - type: pearson
+      value: 78.52
+      name: pearson
+    source:
+      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
+      name: Open Portuguese LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: FaQuAD NLI
+      type: ruanchaves/faquad-nli
+      split: test
+      args:
+        num_few_shot: 15
+    metrics:
+    - type: f1_macro
+      value: 80.01
+      name: f1-macro
+    source:
+      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
+      name: Open Portuguese LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: HateBR Binary
+      type: ruanchaves/hatebr
+      split: test
+      args:
+        num_few_shot: 25
+    metrics:
+    - type: f1_macro
+      value: 91.78
+      name: f1-macro
+    source:
+      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
+      name: Open Portuguese LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: PT Hate Speech Binary
+      type: hate_speech_portuguese
+      split: test
+      args:
+        num_few_shot: 25
+    metrics:
+    - type: f1_macro
+      value: 68.36
+      name: f1-macro
+    source:
+      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
+      name: Open Portuguese LLM Leaderboard
+  - task:
+      type: text-generation
+      name: Text Generation
+    dataset:
+      name: tweetSentBR
+      type: eduagarcia/tweetsentbr_fewshot
+      split: test
+      args:
+        num_few_shot: 25
+    metrics:
+    - type: f1_macro
+      value: 70.29
+      name: f1-macro
+    source:
+      url: https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard?query=abacusai/Smaug-Llama-3-70B-Instruct
+      name: Open Portuguese LLM Leaderboard
 ---
 # Smaug-Llama-3-70B-Instruct
 with the updated harness as the issue with stop words has been addressed.
+This version of Smaug uses new techniques and new data compared to [Smaug-72B](https://huggingface.co/abacusai/Smaug-72B-v0.1), and more information will be released later on. For now, see the previous Smaug paper: https://arxiv.org/abs/2402.13228.
+# Open Portuguese LLM Leaderboard Evaluation Results
+Detailed results can be found [here](https://huggingface.co/datasets/eduagarcia-temp/llm_pt_leaderboard_raw_results/tree/main/abacusai/Smaug-Llama-3-70B-Instruct) and on the [🚀 Open Portuguese LLM Leaderboard](https://huggingface.co/spaces/eduagarcia/open_pt_llm_leaderboard)
+|          Metric          |  Value  |
+|--------------------------|---------|
+|Average                   |**77.07**|
+|ENEM Challenge (No Images)|    77.89|
+|BLUEX (No Images)         |    69.54|
+|OAB Exams                 |    63.64|
+|Assin2 RTE                |    93.62|
+|Assin2 STS                |    78.52|
+|FaQuAD NLI                |    80.01|
+|HateBR Binary             |    91.78|
+|PT Hate Speech Binary     |    68.36|
+|tweetSentBR               |    70.29|