shisa-ai
/

shisa-v1-llama3-70b

@@ -12,46 +12,50 @@ datasets:
 # shisa-v2 Base Model ablation
-*Per the  Llama 3 Community License Agreement, the official name of this model is "LLama 3 shisa-v1-llama3-70b"*
 This is a fine-tune Llama 3 70B Instruct with the primary `shisa-v1` dataset to improve Japanese language capabilities.
-This model uses a LR of 8e-6 that slightly improves performance vs the original 2e-5 tune (based on and validating predictive power of the the
 results of the Llama 3 8B LR ablations).
 It also uses NEFTune, although the expected impact is neglible for this dataset.
-While the 2e-5 model matched gpt-3.5-turbo performance, this 2e6 version consistently edges it out, so I think it's fair to say that this model "beats" it.
-There are a selection of GGUF quants here: https://huggingface.co/shisa-ai/shisa-v1-llama3-70b-gguf
-While this is merely a test ablation on the road to `shisa-v2`, as the strongest commercially usable open JA model I've tested so far, this model may be of general interest.
 ## Performance
 Measured using a [fork](https://github.com/shisa-ai/shaberi) of [Lightblue's Shaberi benchmark framework](https://github.com/lightblue-tech/japanese_llm_eval):
 | Model                                  | Average | ELYZA-tasks-100 | MT-Bench | Rakuda | Tengu-Bench |
 |----------------------------------------|---------|-----------------|----------|--------|-------------|
 | gpt-4-turbo-2024-04-09                 | 8.75    | 8.78            | 8.74     | 9.18   | 8.31        |
 | CohereForAI/c4ai-command-r-plus        | 7.69    | 7.50            | 7.43     | 9.05   | 6.79        |
 | **shisa-ai/shisa-v1-llama3-70b**       | **7.30**| **7.34**        | **7.67** | **8.15** | **6.04**  |
 | gpt-3.5-turbo-0125                     | 7.17    | 7.24            | 6.98     | 7.64   | 6.82        |
-| **shisa-ai/shisa-v1-llama3-70b**       | **7.17**| **7.16**        | **7.45** | **7.98** | **6.09**  |
 | karakuri-ai/karakuri-lm-8x7b-chat-v0.1 | 7.00    | 7.18            | 6.30     | 7.98   | 6.55        |
 | karakuri-ai/karakuri-lm-70b-chat-v0.1  | 6.84    | 6.86            | 6.43     | 7.85   | 6.23        |
 | lightblue/ao-karasu-72B                | 6.81    | 7.19            | 6.54     | 7.25   | 6.27        |
-| **shisa-ai/shisa-v1-llama3-8b^**       | **6.29**| **6.62**        | **6.41** | **7.05**|**5.07**    |
-| shisa-ai/shisa-swallowmx-13a47b-v1     | 6.17    | 6.48            | 6.07     | 7.11   | 5.03        |
-| **shisa-ai/shisa-v1-llama3-8b**        | **6.10**| **6.52**        | **6.20** | **6.37**|**5.33**    |
 | Rakuten/RakutenAI-7B-chat              | 5.58    | 5.92            | 4.60     | 6.58   | 5.24        |
-| shisa-ai/shisa-v1-gemma-8b             | 5.64    | 6.50            | 5.42     | 5.10   | 5.55        |
-| augmxnt/shisa-gamma-7b-v1              | 5.56    | 5.84            | 4.00     | 6.73   | 5.68        |
 | lightblue/qarasu-14B-chat-plus-unleashed | 5.20  | 5.58            | 4.74     | 5.46   | 5.01        |
 | cyberagent/calm2-7b-chat               | 4.76    | 4.90            | 3.58     | 5.75   | 4.81        |
 | mistralai/Mistral-7B-Instruct-v0.2     | 4.69    | 5.78            | 4.65     | 3.80   | 4.53        |
 | **shisa-ai/shisa-v1-yi1.5-9b**         | **4.63**| **5.98**        | **4.28** | **3.26**|**5.00**    |
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->

 # shisa-v2 Base Model ablation
+*Per the  Llama 3 Community License Agreement, the official name of this model is "Llama 3 shisa-v1-llama3-70b"*
 This is a fine-tune Llama 3 70B Instruct with the primary `shisa-v1` dataset to improve Japanese language capabilities.
+This model uses a LR of 8e-6 that slightly improves performance vs the initial 2e-5 tune (based on and validating predictive power of the the
 results of the Llama 3 8B LR ablations).
 It also uses NEFTune, although the expected impact is neglible for this dataset.
+While the 2e-5 model matched gpt-3.5-turbo performance, this 2e-6 version consistently edges it out, so I think it's fair to say that this model "beats" it.
+While this is merely a test ablation on the road to `shisa-v2`, as the strongest commercially-usable open JA model benchmarked so far, this model may be of general interest.
 ## Performance
 Measured using a [fork](https://github.com/shisa-ai/shaberi) of [Lightblue's Shaberi benchmark framework](https://github.com/lightblue-tech/japanese_llm_eval):
 | Model                                  | Average | ELYZA-tasks-100 | MT-Bench | Rakuda | Tengu-Bench |
 |----------------------------------------|---------|-----------------|----------|--------|-------------|
 | gpt-4-turbo-2024-04-09                 | 8.75    | 8.78            | 8.74     | 9.18   | 8.31        |
+| gpt-4o-2024-05-13                      | 8.72    | 8.88            | 8.69     | 9.15   | 8.16        |
+| gemini-1.5-pro                         | 8.58    | 8.58            | 8.93     | 9.20   | 7.61        |
+| claude-3-opus-20240229                 | 8.55    | 8.64            | 8.58     | 8.75   | 8.23        |
 | CohereForAI/c4ai-command-r-plus        | 7.69    | 7.50            | 7.43     | 9.05   | 6.79        |
 | **shisa-ai/shisa-v1-llama3-70b**       | **7.30**| **7.34**        | **7.67** | **8.15** | **6.04**  |
 | gpt-3.5-turbo-0125                     | 7.17    | 7.24            | 6.98     | 7.64   | 6.82        |
+| **shisa-ai/shisa-v1-llama3-70b.2e5**   | **7.17**| **7.16**        | **7.45** | **7.98** | **6.09**  |
 | karakuri-ai/karakuri-lm-8x7b-chat-v0.1 | 7.00    | 7.18            | 6.30     | 7.98   | 6.55        |
 | karakuri-ai/karakuri-lm-70b-chat-v0.1  | 6.84    | 6.86            | 6.43     | 7.85   | 6.23        |
 | lightblue/ao-karasu-72B                | 6.81    | 7.19            | 6.54     | 7.25   | 6.27        |
+| **shisa-ai/shisa-v1-llama3-8b**        | **6.59**| **6.67**        | **6.95** | **7.05**| **5.68**   |
+| **shisa-ai/shisa-v1-llama3-8b.2e5**    | **6.29**| **6.62**        | **6.41** | **7.05**| **5.07**   |
+| **shisa-ai/shisa-swallowmx-13a47b-v1** | **6.17**| **6.48**        | **6.07** | **7.11**| **5.03**   |
+| lightblue/suzume-llama-3-8B-japanese   | 5.96    | 6.68            | 4.96     | 6.68   | 5.53        |
+| augmxnt/shisa-gamma-7b-v1              | 5.82    | 5.96            | 5.02     | 6.85   | 5.47        |
+| **shisa-ai/shisa-v1-phi3-14b**         | **5.77**| **6.28**        | **5.26** | **6.55**| **5.01**   |
+| **shisa-ai/shisa-v1-gemma-8b**         | **5.64**| **6.50**        | **5.42** | **5.10**| **5.55**   |
 | Rakuten/RakutenAI-7B-chat              | 5.58    | 5.92            | 4.60     | 6.58   | 5.24        |
 | lightblue/qarasu-14B-chat-plus-unleashed | 5.20  | 5.58            | 4.74     | 5.46   | 5.01        |
+| **shisa-ai/shisa-v1-mistral0.3-7b**    | **5.11**| **5.64**        | **6.10** | **3.83**|**4.86**    |
 | cyberagent/calm2-7b-chat               | 4.76    | 4.90            | 3.58     | 5.75   | 4.81        |
 | mistralai/Mistral-7B-Instruct-v0.2     | 4.69    | 5.78            | 4.65     | 3.80   | 4.53        |
 | **shisa-ai/shisa-v1-yi1.5-9b**         | **4.63**| **5.98**        | **4.28** | **3.26**|**5.00**    |
+| augmxnt/shisa-7b-v1                    | 4.50    | 4.63            | 3.95     | 4.89   | 4.53        |
 <!-- This model card has been generated automatically according to the information the Trainer had access to. You
 should probably proofread and complete it, then remove this comment. -->